So we have this alert set up to check to see if any hostnames that are being monitored havnt received any time monitoring data. The current search is as follows:
|inputlookup TimeServersV2.csv | search server="*" | eval HOST =lower(server) | fields HOST | where NOT [search (index=os sourcetype=test_stats*) OR (sourcetype=syslog ptp10 OR phc10sys) OR (index=windows sourcetype="Script:TimeStatus") OR (index=windows sourcetype=domtimec) OR (index=os sourcetype=time)| dedup host | eval HOST=lower(host)| fields HOST ]
The issue with this is, we believe, once it runs at 8 AM, it takes a bit longer to run and process data, and it'll send out partial results after a min or so of running the query. We have a lot of saved reports/alerts/searches running at the top of most hours, so I think it may be sending out the incomplete search results after a bit of running, as splunk starts the next job. I moved its cron job schedule up an hour and a half to a lighter use hour, so that may help a bit, but i would also like to optimize this search so it runs faster. Currently, it runs about 40 seconds to a little over a minute:
What would be the best way to optimize this search so it could possibly be run in under 30 seconds, if possible. Running it outside of the scheduled time runs in about 6 seconds, its just slow when it runs alongside all of the other searches. Itll send us an alert with a list of hostnames its found that were not on the list, yet when we run it manually, it will only spit out 4 or 5 results. That's why we think its not finishing the search when it sends out an alert. Any help would be appreciated.
1. There are already a few apps keeping track of sources and checking if ingestion from them stopped abruptly. For example - TrackMe. Maybe it's worth checking out one or two of them instead of reinventing the wheel?
2. You don't want to have a potentially long-running (or returning many results) search as a subsearch to a short-running one. The long-runing subsearch might be silently finalized prematurely. The way to go (without changing the overall logic) would be to do a "basic" search from indexes, add with append results from the lookup and compare results.
A general idea (pseudosearch):
index=whatever OR index=somewhere_else
| stats values(host) as host
| mvexpand host
| eval where="indexes"
[ | inputlookup mylookup
| table host
| eval where="lookup" ]
| stats values(where) as where by host
This way you get a table of your hosts along with indication whether it was included in yiur lookup or index data or both.
EDIT: There is one more thing worth remembering - in case of the "host" field you can replace search from raw data and stats with tstats - it will give you a rocket boost. But if you wanted another - non-indexed field - searching once a day over a whole day worth of data might simply take long if you have loads of events (a typical example here would be the firewall logs - they generate huge amounts of data). In such case you could think of accelerating your report or using a summary index.