We are using Splunk to alert when we see specific events in our logs.
There are hundreds of different log events we might get, and a few that need to be alerted on.
We have created an event type so we can make our searches quicker, but even the event type configuration is very very large.
The search looks something like this
index = weblogs
Logsfile = “error1”
OR Logfile = “error2”
OR Logfile = “error3”
OR Logfile = “error4”
OR Logfile = “error5”
OR Logfile = “error6”
OR Logfile = “error7”
And on and on
The list ends up around 60-70 different OR statements and the list is growing all the time.
What is the best way to reduce the size of this massive search?
You could put them all in a lookup and use a subsearch to return the massive or statement. (If you want to be able to edit the lookup on the filesystem programmatically, or prefer this method for some reason)
You could use a search macro (probably the easiest method)
You could use a summary search in conjunction with one of the methods above and run it every x minutes, then use the summary index for the alert search. (To help with performance)
A couple of things.
This article has a lot of good stuff for improving searches: http://docs.splunk.com/Documentation/Splunk/latest/Search/Writebettersearches
Aside from that article, you could try some of these things:
After the base search, try piping to the fields command to specify only the fields you're using. For example, if your search has something like stats count by fieldA, fieldB, FieldC or a table fieldA, fieldB, FieldC, try something like this:
... | fields fieldA, fieldB, FieldC | rest of your search
This prevents unnecessary field extractions and improves performance. Try to only specify fields you're using.
Can you narrow the search terms down any further? Splunk likes when you're specific. Is there a specific sourcetype associated with the index? Is there another field consistent throughout all the results?
How big of a time frame are you running this search over? You could consider running the search over smaller periods of time, and either write it to a report or write the output to a namespace. Then call it using tstats.
You could try scheduling a report and accelerating it.