Hello Currently we have a Network operations dashboard that displays charted logs from routers/switches/firewalls syslog messages, we use this as our primary network alert dashboard.
We are using a python script with a few hundred lines of regex strings to basically scrape over all incoming log alerts from our "net" index and only display log events NOT matching these filters.
So one of our searches is as follows, the "netcritical" is our python script that is streaming log events to filter out from Splunk results.
index=net | netcritical | search netcritical=true | timechart limit=0 useother=f c by host | rename c as "Alert Count"
In this way we see any legitimate alerts and new alerts from various devices that we were not looking for but filter out logs that are noisy/we dont care about.
It's worked well in our environment so far but it's not scaling well as we increase the amount of network devices writing to syslog. Our dashboard is taking a long time to load as it's parsing far too many logs.
Can anyone offer some suggestions on the best way to replace this filter?
We want to keep everything in the same "net" for log consistency but looking for alternate solutions within Splunk.
It looks like the filter is the bottleneck in your pipeline. Not knowing what the script looks like, I would suggest the following:
regexmacro and see if splunk's internal regex evaluation engine performs faster than the scripted filter
Hope this helps,
I think the regex's could definitely be improved, i do like the idea of summary indexing and using shorter period searches.
I am thinking of creating an eventtype with all the various log events, and then having the dashboard show events NOT
Not sure how well that will perform but it would at least be relying on the splunk search commands rather then a external python script scraping millions of events.