I have a list of 200+ IPs that I need to search against source addresses in our firewall data. The search needs to span several months of these logs and we consistently ingest <200 GB/day of this log type.
I have uploaded the list of 200+ IPs as a lookup table and can get the following search to run for short time-frames:
@kearaspoor, The only thing i could think of is more indexer and faster disk, but there are few things you can do to improve search performance.
Always use fast mode.
Be specific and don't use extracted fields. Example ip=255.255.255.1 which uses an extracted field instead use "255.255.255.1". Full example index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 126.96.36.199). Try it you will see a significant increase in performance over using key=value.
use NOT to exclude events you don't need need such as success. Example: index=foo sourcetype=bar AND (255.255.255.1 OR 10.98.87.1 OR 188.8.131.52) **NOT "success"**.
If those string occur in other fields add a where statement ...| where isnotnull(ip)
Lastly you can use the return command to run a subsearch for certain fields and return the value into a main search. Example: *index=firewall sourcetype="firewall" [ search inputlookup SampleIPs.csv | return 300 $SampleIPs ] | where isnotnull(source_address) *
What the return command is doing is return a string (10.1.3.5) OR (10.67.89.145) OR (89.76.222)
And when combined the with your base search it looks something like this index=firewall sourcetype="firewall" (10.1.3.5) OR (10.67.89.145) OR (89.76.222)
I really thought you were on to something by removing the extracted fields. Unfortunately, the lookup returns the field/value pair rather than just the IP list so the subsearch is SampleIP= OR SampleIP= not IP or IP 😞