Splunk Search

Why does splunk add junk to the front of a search when a field search is defined before the first pipe?

landen99
Motivator

With the simplest search:

index=checkpoint action=accept | head 1

The normalizedSearch (under Job Inspect, 8.34s) is:

litsearch index=checkpoint ( ( ( sourcetype=opsec_audit ) AND ( ( ( ( ( ( sourcetype=WinRegistry ) AND ( ( registry_type=accept ) ) ) OR ( ( sourcetype=fs_notification ) AND ( ( action=accept ) ) ) ) OR ( vendor_action=accept ) ) ) ) ) ) OR ( ( ( ( sourcetype=fe_json ) AND ( ( "alert.action"=accept ) ) ) OR ( ( sourcetype=fe_xml ) AND ( ( "alerts.alert.action"=accept ) ) ) OR ( ( source="/nsm/bro/logs/current/notice.log" ) AND ( ( EXTRA_FIELD_18=accept ) ) ) ) OR ( action=accept ) ) | litsearch index=checkpoint action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false

A slight modification of the search to put the field search after the first pipe makes the junk go away:

    index=checkpoint accept | search action=accept | head 1

The normalizedSearch (under Job Inspect, 3.1s) is now:

litsearch index=checkpoint accept | search action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false

This is true even when he value "accept" is not before the first pipe.

Why does Splunk insert junk into the normalized search with a field search before the first pipe? The junk increases search time and in some cases where "NOT" OR "!" it can return "no results".

Splunk Version
6.2.3
Splunk Build
264376
Current App
Search & Reporting

1 Solution

dwaddle
SplunkTrust
SplunkTrust

The "junk" is an expansion of reverse lookups that happens because of various CIM-compliant TAs you have installed. I'll explain the concept of reverse lookups, but first let's go back to automatic lookups. Say you have a sourcetype bob defined with an automatic lookup. In props.conf:

[bob]
LOOKUP-actions = boblookup someinputfield OUTPUT action

And in boblookup.csv you have:

someinputfield,action
potato,deny
tomato,accept
blueberry,accept

Now your normal expectation is that when you do a search on sourcetype=bob that events will be matched against the lookup field and have a new field named action when someinputfield has a value of "tomato, potato, or blueberry". But, this same thing can be applied in reverse too.

If you search on action=accept, then Splunk can look through all of its config files and reason-out something like this:

Sourcetype bob has a lookup that outputs a field named action based on this CSV file. I see here in the CSV file that action=accept is returned whenever sometinputfield=blueberry or someinputfield=tomato. So there is an equivalency here:

( sourcetype = bob  AND ( someinputfield = blueberry OR someinputfield = tomato ) )

This is the fundamental step of a reverse lookup - the goal is to attempt to make automatic lookup fields searchable. This is a necessary evil for CIM-compliant apps like Enterprise Security because of how often they use automatic lookups to normalize field names and values.

There's a whole longer discussion here about the performance impacts around this. While it made your example situation slower, there are many other counter examples where this approach (up to a point) speeds things up.

View solution in original post

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...