Splunk Search

What are the different ways to optimize a large search against a large dataset?

Communicator

I have a list of 200+ IPs that I need to search against source addresses in our firewall data. The search needs to span several months of these logs and we consistently ingest <200 GB/day of this log type.

I have uploaded the list of 200+ IPs as a lookup table and can get the following search to run for short time-frames:

search index=firewall sourcetype="firewall" | search [|inputlookup SampleIPs.csv | fields SampleIPs | rename SampleIPs as source_address ] 

But expanding it to a full day ends up taking multiple hours to complete making going back for 3+ months not feasible. Trying to span more than 1 day ends up hitting timeout errors.

Anyone have suggestions on how to better structure this type of search to run more efficiently?

0 Karma

Champion

@kearaspoor, The only thing i could think of is more indexer and faster disk, but there are few things you can do to improve search performance.

  1. Always use fast mode.
  2. Be specific and don't use extracted fields. Example ip=255.255.255.1 which uses an extracted field instead use "255.255.255.1". Full example index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 78.100.89.5). Try it you will see a significant increase in performance over using key=value.
  3. use NOT to exclude events you don't need need such as success. Example: index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 78.100.89.5) *NOT "success"***.
  4. If those string occur in other fields add a where statement ...| where isnotnull(ip)
  5. Lastly you can use the return command to run a subsearch for certain fields and return the value into a main search. Example: *index=firewall sourcetype="firewall" [ search inputlookup SampleIPs.csv | return 300 $SampleIPs ] | where isnotnull(source_address) *

What the return command is doing is return a string
(10.1.3.5) OR (10.67.89.145) OR (89.76.222)
And when combined the with your base search it looks something like this
index=firewall sourcetype="firewall" (10.1.3.5) OR (10.67.89.145) OR (89.76.222)

Hope this helps.

0 Karma

Communicator

I really thought you were on to something by removing the extracted fields. Unfortunately, the lookup returns the field/value pair rather than just the IP list so the subsearch is SampleIP= OR SampleIP= not IP or IP 😞

0 Karma

Contributor