I am trying to develop a search that can identify missing logs based on average of time between log entries for each specific host. I presently have a search string ( | metadata type=hosts |where recentTime < now() - 60 | eval lastSeen = strftime(recentTime, "%F %T") | fields + host lastSeen
) that identifies missed logs based on amount of time from "now", but would like to calculate an avg time and identify missed logs based on their avg+some % ime from now.
Example: A certain log comes in approximately every 5 minutes. I want to identify that certain log in a list if it is not seen for say 5.75 Minutes (based on 5 (AVG) + .75 (15%). I want to do this for each host. I think I will just need to insert the logic in the area of "now() - 60".
The problem is that you need to identify the average, and the metadata command does not provide that information, or sufficient information to compute the average. Let's say that you want to derive the average time between events over the past 30 days to establish the threshold for each source; this could be a pretty expensive calculation if you have hundreds of sources and terrabytes of events.
There are a lot of possible approaches, but I would do this:
Assume that the csv would look like this, with the average arrival time listed in seconds:
source,avgArrivalTime
/var/log/message.log,85465
/var/log/web/access.log,93
The search to create the lookup table would be something like
index=* | fields + source | fields - _raw
| streamstats latest(_time) as latest_time by source window=1 current=f global=f
| eval delta = lastest_time - time
| stats avg(delta) as avgArrivalTime by source
| table source avgArrivalTime
| outputlookup app=t avgTime.csv
The search to identify the "missing" logs would be something like this
| tstats latest(_time) as lastEventTime where index=* by source
| eval timeDelay = now - lastEventTime
| lookup avgTime.csv source OUTPUT avgArrivalTime
| where timeDelay > (avgArrivalTime * 1.15)
The output of the second search is a list of sources where no new events have been indexed within the normal window + 15%
Of course there are lots of ways to do this, but hopefully this will give you a good start.
Hello,
I've been working on using the above SPL to achieve a feed monitoring alert. I would like to point out a small but important error in logic in the above SPL. The code to create the lookup table references "time" which gives the current time during execution. Therefore, as you increase the time of the search the delta becomes the difference between the current time and the time of the event. What actually needs to happen is the comparison of Event Time and Ingest time which would give you the delta in event delivery.
Line 3 in the lookup should be replaced with:
| eval delta = _indextime - latest_time
I tried this answer and it started with an error on the content app=t and did not create the CSV File. It did however, create the table with each host listed, but no avgArrivalTime. Please advise.
try append=t instead of app=t
I was using tstats for this purpose until this application was recommended by a Splunk PS person:
Meta Woot
It should mostly solve your issue, although you might need to do some custom searching based on the lookup's it creates for you...
Good suggestion!