Splunk Search

Best Search Performance when adding filtering of events to query

Na_Kang_Lim
Path Finder

I am looking for the best way in terms of performance when adding filtering of certain events for security rules. Normally for a security rule, it starts off with quite a large scope, for example:

index=windows source=XmlWinEventLog:Security process_name=ipconfig.exe

 Then often in your environment, you would have to filter benign processes, behaviors. Currently, this is how I am writing filters

index=windows source=XmlWinEventLog:Security EventCode=4688 process_name=ipconfig.exe
| search NOT process_command_line="ipconfig /all"
| search NOT process_parent_path=*benign.exe host=BENIGN_HOSTS

This gives the best readability, but I am looking for best performance.
Then what is the best way to write filters? 

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

There are many ways to look at search performance, particularly of windows event log data. You should get comfortable with understanding the search job properties page. In particular, look at the phase1 search and the scanCount. scanCount is the number of events that were scanned to return the results.

With winevent log data particulary, you should understand that in order for the search to know if the process_name field is not what you want, it has to look at all events because process_name is a field that is is mapped by the Windows TA. 

Minimising the number of events you look at (scanCount) will always help performance. Look at this presentation that shows how to use TERM() effectively. That can be a significant benefit to your searches. 

https://conf.splunk.com/files/2020/slides/PLA1089C.pdf

For example, your initial search

index=windows source=XmlWinEventLog:Security process_name=ipconfig.exe

can most likely be significantly improved just by writing

index=windows source=XmlWinEventLog:Security TERM(ipconfig) process_name=ipconfig.exe

because instead of pulling every event out to see if the windows TA has mapped a piece of the raw event to the process_name field, it will ONLY look at the events that have the term ipconfig in the raw event, so given that ipconfig will be a less frequently used command, your scanCount will drop significantly.

In the search log from the inspect job page, search for LISPY and you can see how the parser has interpreted your search.

In your other example of the != vs NOT, take a look at the phase0 search in the job properties. You will no doubt see a significant different in the expanded search.

There are other forms of "filters", such as subsearches and lookups, but I would say that there is not often a one-size-fits-all approach to optimising your searches. It frequently depends on your data and the event count and cardinality of values you get back for fields you're trying to exclude.

Lookups are often a good way to filter data, particularly when your data is still being searched in the index tier, i.e. before a transforming command has sent the data to the search head.

So, it can be more efficient to do this type of logic

index=windows source=XmlWinEventLog:Security 
| lookup process_names.csv process_name OUTPUT is_this_one_i_want
| where isnotnull(is_this_one_i_want)

which will then drop all events where process_name is included in your lookup.

Note that this is a poor example, as it would grab all events and then filter, but the point is that it can be more efficient to first limit your data set in the primary search then filter using a lookup to remove other events rather than writing an up-front really complex set of conditions.

PrewinThomas
Motivator

@Na_Kang_Lim 

Normally, Using multiple | search NOT

  • -Forces Splunk to post-process results after initial retrieval
  • -Doesn’t leverage indexed fields for early filtering
  • -Can be slower, especially on large dataset.

Adding to what @gcusello  mentioned, instead of chaining multiple !=, Use NOT process_name IN (a.exe, b.exe) - It's bit faster and clean on larger datasets and complex queries.


Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

isoutamo
SplunkTrust
SplunkTrust
Here https://conf.splunk.com/files/2019/slides/FN1003.pdf is something to read. It explains how splunk see and use fields when it searching events from buckets.
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Na_Kang_Lim ,

don't use the search command after the main search:

for best performances put the search ters as left as possible:

index=windows source=XmlWinEventLog:Security EventCode=4688 process_name=ipconfig.exe NOT process_command_line="ipconfig /all" NOT process_parent_path=*benign.exe host=BENIGN_HOSTS

then, if it's possible, can you replace your exclusive filters with inclusive filters?

in other words: use process_command_line IN (value1, value2, value3) instead of NOT ...

Ciao.

Giuseppe

Na_Kang_Lim
Path Finder

Hi, I know that inclusive is the best case, but I am talking about when you have to start off with broad scope.

Is there any other syntax I should follow?

Like is there any difference in performance between writing

process_name!=a.exe process_name!=b.exe

 and

NOT process_name IN (a.exe, b.exe)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Na_Kang_Lim ,

you have different results:

using "!=" you take all the events with the process_name different than the value, but where the process_name  field is present,

Instead using "NOT" you exclude events with process_name=value and include also events without the process_name field.

For more information, see at https://docs.splunk.com/Documentation/Splunk/9.4.2/Search/NOTexpressions 

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...