Deployment Architecture

Why is the metadata type=hosts command for *nix search heads showing incorrect lastTime and recentTime?

Communicator

I am using the metadata type=host command to alert me when a forwarder goes down and am now wanting to extend it to search heads. The command works great for *nix forwarders but for *nix search heads it is showing me that 2/3 SH heads haven't reported in 82 days. These are both up and forwarding their _internal logs to the indexers.

Any ideas why this is reporting incorrectly?

Tags (3)
1 Solution

SplunkTrust
SplunkTrust

I'd recommend switching to tstats for this kind of reporting, it'll still be blazingly fast and much more flexible. For example:

| tstats latest(_time) where index=_internal by host

That'll give you the latest timestamp for each host in the _internal index. If there's an event sent by those SHs later than that 82 days ago it'll find it.

As for actually monitoring your deployment, take a look at the new distributed monitoring console that was just released together with 6.2 - awesome stuff.

View solution in original post

SplunkTrust
SplunkTrust

I'd recommend switching to tstats for this kind of reporting, it'll still be blazingly fast and much more flexible. For example:

| tstats latest(_time) where index=_internal by host

That'll give you the latest timestamp for each host in the _internal index. If there's an event sent by those SHs later than that 82 days ago it'll find it.

As for actually monitoring your deployment, take a look at the new distributed monitoring console that was just released together with 6.2 - awesome stuff.

View solution in original post

SplunkTrust
SplunkTrust

Heh... yeah, if the time reported is wrong then it's no surprise results based on the time are wrong as well.

0 Karma

SplunkTrust
SplunkTrust

Maybe this is a copypasta error, but the host showing long no-rep times you posted isn't the host showing events in the second query.

0 Karma

Communicator

Sorry, I wrote it incorrectly in the result. NOREPP02 is the computer that is showing up in the tstats search.

0 Karma

Communicator

I had another thought this morning that may be affecting this. The report is ran against any universal windows forwarders which all report directly to the indexer....except the one that keeps showing up as not reporting. It is setting in our DMZ area and sends its data through a splunk forwarder. I don't fully understand tstats so could this be creating the problem?

Strangely enough it is showing its reporting, just that its always been more than a day since it last reported.

0 Karma

SplunkTrust
SplunkTrust

That could be related to weird timestamping. Compare events from a "good" forwarder to events from this forwarder, particularly with respect to fields such as _time, date_*, timestartpos, timeendpos.

Communicator

That was exactly it!! The clock on the forwarder was set incorrectly. Thanks again for your help Martin.

0 Karma

SplunkTrust
SplunkTrust

Give the field a usable name:

| tstats latest(time) as latestTime where ...

You could enclose latest(_time) in single quotes in eval like this: 'latest(_time)', but that's a last resort. Rename > that.

0 Karma

Communicator

Martin,
Thanks for all your help. I have this functioning correctly but there is one computer that keeps showing up as not reporting recently. When I search Splunk though I see that it is reporting regularly. Here is the search:

tstats latest(_time) AS Latest where index=winevents by host | eval LastMessageTime = now() - Latest | where LastMessageTime > 1800 | eval LastMessageTime = tostring(LastMessageTime, "duration") | eval Latest =strftime(Latest,"%x %X") | sort -LastMessageTime

It returns a result for AMSPEPP02 and says its Latest is about 18 hours ago.

But when I run this search:
index=winevents host=norepp02
I see that it reported a few minutes ago.

Any ideas why that one computer keeps showing up like that?

0 Karma

Communicator

Thanks Martin, that gave me the correct information now but I'm having trouble using the latest(_time) in calculations. I want to create a calculations to show how long since last message and was trying to use this:

| tstats latest(_time) where index=_internal by host | eval timeSinceLastMessage = now() - latest(_time)

I get an error though because it doesn't recognize latest(_time) in the eval function. How can I use that column of data?

0 Karma