We have a modest Splunk deployment (a few hundred forwarders, 4 indexers, 2 search heads, deployment server) and are taking in around 60G per day (~1 million events / 5 minutes). We'd like to be searching these for specific regexps and, if they hit, kicking them off to a script that injects alarms into Zabbix, our monitoring platform.
What I do not want (and assume cannot do) is to set up 600 realtime searches and have them running 24/7. What I especially do not want is to cripple Splunk for the sake of this monitoring.
What I'd like to do is pull 1 realtime search that goes through a table or somesuch and checks each line against it, kicking it to the script if it hits (with an alarm name, so like alarm name = foo if /foo foo bar/) and passing through otherwise. Does anyone know a good way to do this?
This idea is very attractive, but I can't think of a way to test for 600+ conditions that would actually perform fast enough to be effective as a real time search.
Can you group your 600 searches in some way? For example, grouping them by source, sourcetype or host?
I could see running a few real-time searches, and some scheduled searches that run every 5 minutes, etc. Only the critical few items probably deserve a realtime search anyway. Each realtime search consumes a CPU core, so if you want a dozen of them to run at once, you probably should set up a separate search head just to run your alerting searches.
If you group your searches, then you could probably use lookup tables to further match and refine your alerts.
There's not a clean way to group them into significantly smaller searches. I'm not tied to "real time search" per se- a 5 minute gap would be fine. Still a daunting task.
Yeah, that's about what I was thinking. I was hoping there was some way to do this cleanly, but now I'm starting to think we may need to explore workarounds.