Splunk Dev

Search Query Optimizing

kunalsingh
Engager

Please help me to Optimize this Splunk Query

index:: rasp_

NOT [inputlookup Scanners_Ext.csv | fields forwarded_for]

NOT [inputlookup Scanners_Int.csv | rename ip_addr AS forwarded_for | fields forwarded_for]

NOT [inputlookup vz_nets.csv | rename netblock AS forwarded_for | fields forwarded_for]

NOT (forwarded_for="140.108.26.152" OR forwarded_for="" OR forwarded_for="10.*" OR forwarded_for=null) app!="" app!="\"*\"" app!="VASTID*" host!="10.215*" host!="ip-10-*" host!="carogngsa*" host!="carogngta*" host!="carofuzedd** host!="*ebiz*" host!="echo*" host!="not logged" host!="onm*" host!="tfnm*" host!="voip*" host!="wfm*" category!="Config*" category!="Depend*" category!="Stat*" category!="Large*" category!="Uncaught*" category!="Unvalidated Redirect" category!="License" category!="*Parse*" action=*

| stats count
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Optimising this will depend on your data. Using subsearches with lookups can be expensive and using NOT with subsearches, even more so.

Depending on the volume of entries in those lookups you will be better off using a lookup, e.g.

index:: rasp_ NOT (
    forwarded_for="140.108.26.152" OR 
    forwarded_for="" OR 
    forwarded_for="10.*" OR 
    forwarded_for=null)
    app!="" app!="\"*\"" app!="VASTID*" 
    host!="10.215*" host!="ip-10-*" host!="carogngsa*" host!="carogngta*" host!="carofuzedd** host!="*ebiz*" host!="echo*" host!="not logged" host!="onm*" host!="tfnm*" host!="voip*" host!="wfm*"
    category!="Config*" category!="Depend*" category!="Stat*" category!="Large*" category!="Uncaught*" category!="Unvalidated Redirect" category!="License" category!="*Parse*" action=*

| lookup Scanners_Ext.csv forwarded_for OUTPUT forwarded_for as found
| where isnull(found)
| lookup Scanners_Int.csv ip_addr as forwarded_for OUTPUT ip_addr as found 
| where isnull(found)
| lookup vz_nets.csv netblock as forwarded_for OUTPUT netblock as found
| where isnull(found)

| stats count

so the static NOT statement and other != comparisons is part of the search and then you do each lookup in turn and if it's found then it will be discarded.

The order of the 3 lookups would be in likely match count order, so the first lookup should be done that would be expected to reduce the event count by the max, and so on.

Using NOT or all your != wildcard searches at the beginning will be somewhat expensive, you can use TERM() to reduce data scan count, but that requires knowing your data well.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Optimising this will depend on your data. Using subsearches with lookups can be expensive and using NOT with subsearches, even more so.

Depending on the volume of entries in those lookups you will be better off using a lookup, e.g.

index:: rasp_ NOT (
    forwarded_for="140.108.26.152" OR 
    forwarded_for="" OR 
    forwarded_for="10.*" OR 
    forwarded_for=null)
    app!="" app!="\"*\"" app!="VASTID*" 
    host!="10.215*" host!="ip-10-*" host!="carogngsa*" host!="carogngta*" host!="carofuzedd** host!="*ebiz*" host!="echo*" host!="not logged" host!="onm*" host!="tfnm*" host!="voip*" host!="wfm*"
    category!="Config*" category!="Depend*" category!="Stat*" category!="Large*" category!="Uncaught*" category!="Unvalidated Redirect" category!="License" category!="*Parse*" action=*

| lookup Scanners_Ext.csv forwarded_for OUTPUT forwarded_for as found
| where isnull(found)
| lookup Scanners_Int.csv ip_addr as forwarded_for OUTPUT ip_addr as found 
| where isnull(found)
| lookup vz_nets.csv netblock as forwarded_for OUTPUT netblock as found
| where isnull(found)

| stats count

so the static NOT statement and other != comparisons is part of the search and then you do each lookup in turn and if it's found then it will be discarded.

The order of the 3 lookups would be in likely match count order, so the first lookup should be done that would be expected to reduce the event count by the max, and so on.

Using NOT or all your != wildcard searches at the beginning will be somewhat expensive, you can use TERM() to reduce data scan count, but that requires knowing your data well.

 

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2026-2027 SplunkTrust is officially open. If ...