Splunk Search

Why do I get different Search Behavior depending on fields.conf?

mzorzi
Splunk Employee
Splunk Employee

I'm running a search based on a field extracted at search time using props.conf.

I've noticed that if I don't have a fields.conf, my search works fine. Instead if I create a fields.conf and I specify INDEXED=true, I have to add a * at the end of the value I'm searching for.

An example:

My props.conf looks like:

 [myst]

    EXTRACT-dbflds = ^\\[?< locationid>.\*?\] \[?< hostname>.\*?\] \[?< database>.\*?\] \[?< instance>.\*?\] \[?< pid>.\*?\] \[?< thread>.\*?\] \[(.\*?)\]

My source file is like:

    [2010-04-09 17:29:51,085] [asia123] [bighost] [dbprod] [pango] [pid675] [open.connection] 
    [2010-04-09 18:49:52,063] [europe345] [smallhost] [dbdev] [acaia] [pid987] [close.transaction] 

When there is no fields.conf at all, or there is a fields.conf but I with INDEXED=false for every field stanzas, my search:

sourcetype=myst instance=pango

works correctly.

Instead, if there is a fields.conf and I specified INDEXED=true my previous search doesn't return any result, but this does:

sourcetype=myst instance="pango*"

Why this different behavior?

2 Solutions

jrodman
Splunk Employee
Splunk Employee

The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).

In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.

View solution in original post

Lowell
Super Champion

This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..

BTW, it looks like your regex got scrambled when you posted your question.

With that said, you may want to consider the following regex tweaks:

  • Drop the trailing un-named group which probably isn't necessary. (Or give it name if you want it.)
  • Using non-greedy dot matching is normally a good thing (when compared to simple .*), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+ rather than .*?.

This should work well for you:

EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]

Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)

View solution in original post

0 Karma

Lowell
Super Champion

This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..

BTW, it looks like your regex got scrambled when you posted your question.

With that said, you may want to consider the following regex tweaks:

  • Drop the trailing un-named group which probably isn't necessary. (Or give it name if you want it.)
  • Using non-greedy dot matching is normally a good thing (when compared to simple .*), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+ rather than .*?.

This should work well for you:

EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]

Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)

0 Karma

jrodman
Splunk Employee
Splunk Employee

Wow, regex optimization tips. I've always meant to do some experimentation with timing regex behavior but doing it in a performant enough language to get tight results seemed too boring.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You should not be setting anything in fields.conf for search-time extracted fields. Setting INDEXED=true tells Splunk to look for your field as a separately stored and indexed field. It won't be unless it was index-time extracted and stored. (Which, BTW, is rarely recommended.)

The fact that it works when you add the wildcard is either a bug or a special-casing of wildcard behavior.

jrodman
Splunk Employee
Splunk Employee

The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).

In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...