I'm running a search based on a field extracted at search time using props.conf.
I've noticed that if I don't have a fields.conf, my search works fine. Instead if I create a fields.conf and I specify INDEXED=true, I have to add a * at the end of the value I'm searching for.
An example:
My props.conf looks like:
[myst]
EXTRACT-dbflds = ^\\[?< locationid>.\*?\] \[?< hostname>.\*?\] \[?< database>.\*?\] \[?< instance>.\*?\] \[?< pid>.\*?\] \[?< thread>.\*?\] \[(.\*?)\]
My source file is like:
[2010-04-09 17:29:51,085] [asia123] [bighost] [dbprod] [pango] [pid675] [open.connection]
[2010-04-09 18:49:52,063] [europe345] [smallhost] [dbdev] [acaia] [pid987] [close.transaction]
When there is no fields.conf at all, or there is a fields.conf but I with INDEXED=false for every field stanzas, my search:
sourcetype=myst instance=pango
works correctly.
Instead, if there is a fields.conf and I specified INDEXED=true my previous search doesn't return any result, but this does:
sourcetype=myst instance="pango*"
Why this different behavior?
The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).
In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.
This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..
BTW, it looks like your regex got scrambled when you posted your question.
With that said, you may want to consider the following regex tweaks:
.*
), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+
rather than .*?
.This should work well for you:
EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]
Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)
This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..
BTW, it looks like your regex got scrambled when you posted your question.
With that said, you may want to consider the following regex tweaks:
.*
), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+
rather than .*?
.This should work well for you:
EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]
Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)
Wow, regex optimization tips. I've always meant to do some experimentation with timing regex behavior but doing it in a performant enough language to get tight results seemed too boring.
You should not be setting anything in fields.conf for search-time extracted fields. Setting INDEXED=true
tells Splunk to look for your field as a separately stored and indexed field. It won't be unless it was index-time extracted and stored. (Which, BTW, is rarely recommended.)
The fact that it works when you add the wildcard is either a bug or a special-casing of wildcard behavior.
The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).
In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.