Mostly, search-time fields have superior performance to parse-time
(indexed) fields, regardless of whether they are explicitly
configured.
When running a search that includes a term such as fieldname=value ,
Splunk will treat this as a search-time field by default, unless
fieldname is explicitly configured as an indexed field in
fields.conf. This is true both for configured fields (delimiters,
regular expressions) as well as for automatically identified fields
where, eg you have fieldname:value in the text of your event. We
call this automatic handling code auto-kv for automatic key-value
extraction.
The Splunk search machinery presumes that value will be present in
the events as an indexed string, and will apply the same mechanics to
filter the events as if you entered the string directly without the
fieldname or equals sign. For most patterns, this offers all the
performance advantage of a parse-time field, and none of the penalty.
The tradeoffs are discussed in more detail in "About indexed field extraction" in the Getting Data In Manual.
In all cases, the post-filtering is applied to the (hopefully) small
set of events that actually contain the value string, by applying
any extraction mechanisms, then testing to see if the field has been
created containing the desired value.
Ideally the index-based filtering is the most important factor in the
speed of your search, but there are cases when search-time extraction
must be applied to a large percentage of events. For example if
almost all of your events have the word xml but only a small portion
have this value in the storage_format field, the speed of extraction
becomes important. Delim-based extractions are quite fast. Auto-kv
are quite fast. Regex-based extractions are slower. Sourcetypes with
a very large number of regexes or very inefficient regexes can be
slower still.
... View more