Here's one possible solution I think would work if the there are constant events coming in from each source.
search source="a" | head 1 | append [search source="b" | head 1] | stats min(_time) as LatestReliableTime
How else would I know I have a complete picture of all the events from all sources up to some timestamp?
You mean min(_time) by source? I need to take the min to ensure that all the other source have forwarded their events. This makes the assumption that the source will continually have new events, what if there are no events from source B for days? Then that "reliable" time is going to be lagged behind.
Ah, right, sorry! Yes,
min is correct. So then, we have something like
earliest=-2d | STATS min(_time) BY source | RENAME "min(_time)" AS tmin | STATS min(tmin) | CONVERT ctime, which produces a single answer -- the global minimum. Note the
earliest=-2d, which keeps our search to only 2 latest days.
Why not just use a metdata search, similar to the ones provided here? http://www.splunk.com/wiki/Deploy:HowToFindLostForwarders
| metadata type=hosts | eval age = now() - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime
The 'lastTime' field will tell you when Splunk last received an event from that host, and because you're searching the metadata, it should be a very quick answer.
lastTime still correspond to the last event that occurred for the host, not the last time it got an update from the forwarder, right? According to that link you sent, there used to be a "heartbeat" message sent from the forwarder. Why did that go away?
I'm borrowing from Mick's answer here. I just want to point out that you can use this
metadata approach to really capture two different scenarios:
I would like to make the argument that both are of equal importance to monitor, for the following reasons:
lastTimewill not be able to reflect the current time correct time until that point. Therefore if the forwarder goes down within that time frame, it will not be detected by an alert that only is looking for old events.)
Here is a search that can detect both situations:
| metadata index=_internal type=hosts | eval age=time()-lastTime | search age>60 OR age<-15 | sort age d | convert ctime(lastTime) | fields age,host,lastTime
There are a few things to note here:
index=_internalwill let you check for down forwarder on a much quicker interval. (Since metrics events are generated every 30-seconds). If you are not, then pick your most active index, or you could use something like
| metadata type=hosts | append [ metadata index=os type=hosts ] | stats max(lastTime) as lastTime by hostto query more than one index. (Try to make sure that you pick an index that is less-prone to timestamp configuration glitches which could prevent this alert from working properly.)
time()here and not
now(). This is because we want the current system clock instead of the time the search was scheduled to run. This allows for a more accurate value for age, which can otherwise be skewed due to delays is scheduled search execution. (Note that if your are running a version prior to 4.1, then you'll need to use
time(), and you will need to change the -15 number to something like -60 or -90, all depending on your potential scheduler delays)