Splunk Search

What are the different ways to optimize a large search against a large dataset?

kearaspoor
SplunkTrust
SplunkTrust

I have a list of 200+ IPs that I need to search against source addresses in our firewall data. The search needs to span several months of these logs and we consistently ingest <200 GB/day of this log type.

I have uploaded the list of 200+ IPs as a lookup table and can get the following search to run for short time-frames:

search index=firewall sourcetype="firewall" | search [|inputlookup SampleIPs.csv | fields SampleIPs | rename SampleIPs as source_address ] 

But expanding it to a full day ends up taking multiple hours to complete making going back for 3+ months not feasible. Trying to span more than 1 day ends up hitting timeout errors.

Anyone have suggestions on how to better structure this type of search to run more efficiently?

0 Karma

bmacias84
Champion

@kearaspoor, The only thing i could think of is more indexer and faster disk, but there are few things you can do to improve search performance.

  1. Always use fast mode.
  2. Be specific and don't use extracted fields. Example ip=255.255.255.1 which uses an extracted field instead use "255.255.255.1". Full example index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 78.100.89.5). Try it you will see a significant increase in performance over using key=value.
  3. use NOT to exclude events you don't need need such as success. Example: index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 78.100.89.5) **NOT "success"**.
  4. If those string occur in other fields add a where statement ...| where isnotnull(ip)
  5. Lastly you can use the return command to run a subsearch for certain fields and return the value into a main search. Example: *index=firewall sourcetype="firewall" [ search inputlookup SampleIPs.csv | return 300 $SampleIPs ] | where isnotnull(source_address) *

What the return command is doing is return a string
(10.1.3.5) OR (10.67.89.145) OR (89.76.222)
And when combined the with your base search it looks something like this
index=firewall sourcetype="firewall" (10.1.3.5) OR (10.67.89.145) OR (89.76.222)

Hope this helps.

0 Karma

kearaspoor
SplunkTrust
SplunkTrust

I really thought you were on to something by removing the extracted fields. Unfortunately, the lookup returns the field/value pair rather than just the IP list so the subsearch is SampleIP= OR SampleIP= not IP or IP 😞

0 Karma

jensonthottian
Contributor
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...