Archive

Efficient searches using boolean operators

Explorer

I'm currently trying to optimize my searches to keep my Splunk searches as quick as possible. Is there any appreciable difference in search time or efficiency in the two following searches? My main point is, does condensed logic help make searches faster, or does it not matter in these cases?

index=* NOT a NOT b NOT c
index=* NOT (a OR b OR c)

Both of these are logically equivalent because of the implied ANDs in the first query ((NOT a) AND (NOT b) AND (NOT c)), so I was curious if there was any major timing difference in the two queries.

1 Solution

SplunkTrust
SplunkTrust

When you look at the search job inspector, you'll see debug messages at the very top. For both your examples they read the same:

DEBUG: base lispy: [ AND [ NOT a ] [ NOT b ] [ NOT c ] index::* ]

There cannot be a timing difference because Splunk's doing the same thing underneath.

As a general optimization, the NOT operator can be slow in many situations. For example, when you run this:

index=_internal NOT log_level=INFO

You can see Splunk is scanning many events for only few matches. Looking at the debug info you see this:

DEBUG: base lispy: [ AND index::_internal ]

This means Splunk was not able to use any filter beyond selecting the index. That's because there's no word to look for that could be sped up by the index structure. Loading events without the word "info" wouldn't be correct, because it could appear elsewhere other than in the field log_level.

On the other hand, running this search is faster:

index=_internal NOT INFO

The debug shows it's using some index structures to only look for events that don't have the word info in them, and avoids loading them off disk:

DEBUG: base lispy: [ AND index::_internal [ NOT info ] ]

These two searches obviously aren't equivalent.

View solution in original post

SplunkTrust
SplunkTrust

When you look at the search job inspector, you'll see debug messages at the very top. For both your examples they read the same:

DEBUG: base lispy: [ AND [ NOT a ] [ NOT b ] [ NOT c ] index::* ]

There cannot be a timing difference because Splunk's doing the same thing underneath.

As a general optimization, the NOT operator can be slow in many situations. For example, when you run this:

index=_internal NOT log_level=INFO

You can see Splunk is scanning many events for only few matches. Looking at the debug info you see this:

DEBUG: base lispy: [ AND index::_internal ]

This means Splunk was not able to use any filter beyond selecting the index. That's because there's no word to look for that could be sped up by the index structure. Loading events without the word "info" wouldn't be correct, because it could appear elsewhere other than in the field log_level.

On the other hand, running this search is faster:

index=_internal NOT INFO

The debug shows it's using some index structures to only look for events that don't have the word info in them, and avoids loading them off disk:

DEBUG: base lispy: [ AND index::_internal [ NOT info ] ]

These two searches obviously aren't equivalent.

View solution in original post

Explorer

Thanks! I'll keep those in mind!

0 Karma