With the simplest search:
index=checkpoint action=accept | head 1
The normalizedSearch (under Job Inspect, 8.34s) is:
litsearch index=checkpoint ( ( ( sourcetype=opsec_audit ) AND ( ( ( ( ( ( sourcetype=WinRegistry ) AND ( ( registry_type=accept ) ) ) OR ( ( sourcetype=fs_notification ) AND ( ( action=accept ) ) ) ) OR ( vendor_action=accept ) ) ) ) ) ) OR ( ( ( ( sourcetype=fe_json ) AND ( ( "alert.action"=accept ) ) ) OR ( ( sourcetype=fe_xml ) AND ( ( "alerts.alert.action"=accept ) ) ) OR ( ( source="/nsm/bro/logs/current/notice.log" ) AND ( ( EXTRA_FIELD_18=accept ) ) ) ) OR ( action=accept ) ) | litsearch index=checkpoint action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false
A slight modification of the search to put the field search after the first pipe makes the junk go away:
index=checkpoint accept | search action=accept | head 1
The normalizedSearch (under Job Inspect, 3.1s) is now:
litsearch index=checkpoint accept | search action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false
This is true even when he value "accept" is not before the first pipe.
Why does Splunk insert junk into the normalized search with a field search before the first pipe? The junk increases search time and in some cases where "NOT" OR "!" it can return "no results".
Search & Reporting
FYI, I checked this on my system and it adds totally different (and much less, as well as much less-strange) baggage. I assume that search performance is significantly different between the 2 as well?
correct. As shown above, 3.1s vs 8.34s for pulling just one event (head 1).
On my local search head (on my laptop) tied to the corporate license, I search:
index=checkpoint action=accept | head 1
litsearch index=checkpoint ( action=accept OR ( sourcetype="Script:TimesyncStatus" ) ) | litsearch index=checkpoint action=accept | fields keepcolorder=t "*" "bkt" "cd" "si" "host" "index" "linecount" "source" "sourcetype" "splunkserver" | prehead limit=1 null=false keeplast=false
TimesyncStatus belongs to SplunkTAwindows which is on this search head. Why does this normalized search have so much less? Could it be that other TAs are complicating the matter on the other search head? How should I proceed to discover what is causing this and why?
Yes, it is a function of the number of TAs you have installed and the configuration they contain. Whenever a app (TAs, SAs, etc) contains an automatic lookup - no matter what the app is - and Splunk can resolve that the automatic lookup has an output field named the same as one of your search terms, it will attempt to make a reverse lookup. The more apps you have, and the more automatic lookups those apps define, the more of this activity you'll see.
THIS DOES NOT MEAN that you disable those automatic lookups. They are there for a reason, in order to make data CIM compliant. Breaking how TAs make your data CIM compliant in order to achieve search performance improvements is a bit like burning down the house to get rid of the spider.
If you need to fix the performance problems, then I would suggest discussing with support how this can be made better, without breaking CIM.
The "junk" is an expansion of reverse lookups that happens because of various CIM-compliant TAs you have installed. I'll explain the concept of reverse lookups, but first let's go back to automatic lookups. Say you have a sourcetype
bob defined with an automatic lookup. In props.conf:
[bob] LOOKUP-actions = boblookup someinputfield OUTPUT action
And in boblookup.csv you have:
someinputfield,action potato,deny tomato,accept blueberry,accept
Now your normal expectation is that when you do a search on
sourcetype=bob that events will be matched against the lookup field and have a new field named
someinputfield has a value of "tomato, potato, or blueberry". But, this same thing can be applied in reverse too.
If you search on
action=accept, then Splunk can look through all of its config files and reason-out something like this:
Sourcetype bob has a lookup that outputs a field named
action based on this CSV file. I see here in the CSV file that
action=accept is returned whenever
someinputfield=tomato. So there is an equivalency here:
( sourcetype = bob AND ( someinputfield = blueberry OR someinputfield = tomato ) )
This is the fundamental step of a reverse lookup - the goal is to attempt to make automatic lookup fields searchable. This is a necessary evil for CIM-compliant apps like Enterprise Security because of how often they use automatic lookups to normalize field names and values.
There's a whole longer discussion here about the performance impacts around this. While it made your example situation slower, there are many other counter examples where this approach (up to a point) speeds things up.
SO are you saying that I should find lookup tables defined in auto-lookups with the "action" field? And if I disable the auto-lookup then the junk will go away?
What if I have a lookup on src against a table listing all src ips and their hostnames defined for host::*? Will that include every src ip in the table as the junk in front of the normalizedSearch for every sourcetype defined in inputs.conf? Am I not completely misunderstanding your explanation here?
Also, I looked in my auto-lookups and noticed fs_notification (among the many things having "action" field in them) in this line:
fs_notification : LOOKUP-action_for_fs_notification endpoint_change_vendor_action_lookup vendor_action OUTPUT action
Also, I imagine that SplunkTAWindows is probably widely used and that most people will be seeing this issue. Do we need to disable that app for that particular entry to stop messing with our searches?
I'm not necessarily suggesting disabling anything - that's up to you, but you'll need to understand how that may affect the workings of existing searches. If you do have a lookup defined on
[host::*] then yes there's a chance for reverve lookups to have weird effects. There are things in
limits.conf that attempt to place boundaries on the behaviour though.
Reverse lookups are a bit of an obscure topic, so a support case may be a good idea in order to better understand what you can do with them.
Also, I'm not sure that measuring "time to first event" is necessarily the best way to approach comparing performance. Part of the goal of the reverse lookup approach is to turn dense searches into less dense ones. Doing a
| head 1 of a dense search is almost always going to be quick, but it doesn't demonstrate how long it takes to get back all of the events.
Last conf2015 session explained that field alias and calc. fields on CIM fields also cause this issue (in addition to auto-lookups).