Hello Splunkers,
I'm looking for a logic suggestion for building SPL query.
Scenario: Alert/report when data feed stopped reporting to splunk. Data feed is based on frequency (example: one app data is sending feed once in every 10 mins and few of them are once in a day and few of them are once in 7 days). so based on the frequency, logic has to be built.
Focusing on using tstats as it gives faster response and limit the resource utilization. However, using tstats, I don't get a latest event time for the indexes if when log stopped reporting previous week or 2 days, so when I run last 1 day. Metadata gives the lastTime though timeperiod is last 5 min but it will be slower than tstats.
My logic is
|inputlooup frequency_data.csv
|fields index sourcetype frequency |join type=left index sourcetype
|[tstats latest(_time) as latest_event WHERE index=* by index sourcetype ]
|eval latest_event=coalesce(latest_event,"0")
|eval current_time =now()
|eval buffer = if(latest_event="0", "current_time", current_time-latest_event)
|eval feed_status= case(latest_event=0, "Feed Stopped", buffer> frequency, "Feed delayed", buffer<frequency, "Feed Healthy")
Looks like the logic is not returning correct results. Kindly provide some assistance.
Data Onboarding
There are numerous questions like this, as it's a common need.
As for your search, it's best to do the | tstats BEFORE handling the lookup as JOIN should never be used - there is always a better way than join.
In this case, it's inputlookup+stats, e.g.
| tstats latest(_time) as latest_event WHERE index=* by index sourcetype
| inputlookup append=t frequency_data.csv
| fields index sourcetype frequency
| fillnull latest_event
| stats max(latest_event) as latest_event values(frequency) as frequency by index sourcetype
| eval current_time =now()
| eval buffer = if(latest_event="0", "current_time", current_time-latest_event)
| eval feed_status= case(latest_event=0, "Feed Stopped", buffer> frequency, "Feed delayed", buffer<frequency, "Feed Healthy")If you are not running your search with a time window > max(frequency) you will not be able to detect those scenarios where data has stopped some time before your time window.
So, you generally have to maintain the 'last event' in a more regular search where you store the last event from the index/sourcetype, e.g. in your frequency_data.csv. I've never found metadata a useful command and it's not reliable.
You could look at TrackMe - it's pretty good, but needs some setup and management.
I have written my own version that runs an hourly search and collects latest time from a monitored set of hosts/index/sourcetypes and stores them so the "missing data" can determine the latest from those longer frequencies.
I know that technically it's the same but I always struggle not to wince when I see latest(_time). max(_time) is the way to go 🙂