Splunk Search

Efficient searches using boolean operators

petermuller
Explorer

I'm currently trying to optimize my searches to keep my Splunk searches as quick as possible. Is there any appreciable difference in search time or efficiency in the two following searches? My main point is, does condensed logic help make searches faster, or does it not matter in these cases?

index=* NOT a NOT b NOT c
index=* NOT (a OR b OR c)

Both of these are logically equivalent because of the implied ANDs in the first query ((NOT a) AND (NOT b) AND (NOT c)), so I was curious if there was any major timing difference in the two queries.

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

When you look at the search job inspector, you'll see debug messages at the very top. For both your examples they read the same:

DEBUG: base lispy: [ AND [ NOT a ] [ NOT b ] [ NOT c ] index::* ]

There cannot be a timing difference because Splunk's doing the same thing underneath.

As a general optimization, the NOT operator can be slow in many situations. For example, when you run this:

index=_internal NOT log_level=INFO

You can see Splunk is scanning many events for only few matches. Looking at the debug info you see this:

DEBUG: base lispy: [ AND index::_internal ]

This means Splunk was not able to use any filter beyond selecting the index. That's because there's no word to look for that could be sped up by the index structure. Loading events without the word "info" wouldn't be correct, because it could appear elsewhere other than in the field log_level.

On the other hand, running this search is faster:

index=_internal NOT INFO

The debug shows it's using some index structures to only look for events that don't have the word info in them, and avoids loading them off disk:

DEBUG: base lispy: [ AND index::_internal [ NOT info ] ]

These two searches obviously aren't equivalent.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

When you look at the search job inspector, you'll see debug messages at the very top. For both your examples they read the same:

DEBUG: base lispy: [ AND [ NOT a ] [ NOT b ] [ NOT c ] index::* ]

There cannot be a timing difference because Splunk's doing the same thing underneath.

As a general optimization, the NOT operator can be slow in many situations. For example, when you run this:

index=_internal NOT log_level=INFO

You can see Splunk is scanning many events for only few matches. Looking at the debug info you see this:

DEBUG: base lispy: [ AND index::_internal ]

This means Splunk was not able to use any filter beyond selecting the index. That's because there's no word to look for that could be sped up by the index structure. Loading events without the word "info" wouldn't be correct, because it could appear elsewhere other than in the field log_level.

On the other hand, running this search is faster:

index=_internal NOT INFO

The debug shows it's using some index structures to only look for events that don't have the word info in them, and avoids loading them off disk:

DEBUG: base lispy: [ AND index::_internal [ NOT info ] ]

These two searches obviously aren't equivalent.

petermuller
Explorer

Thanks! I'll keep those in mind!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...