Deployment Architecture

Why is the metadata type=hosts command for *nix search heads showing incorrect lastTime and recentTime?

hlarimer
Communicator

I am using the metadata type=host command to alert me when a forwarder goes down and am now wanting to extend it to search heads. The command works great for *nix forwarders but for *nix search heads it is showing me that 2/3 SH heads haven't reported in 82 days. These are both up and forwarding their _internal logs to the indexers.

Any ideas why this is reporting incorrectly?

Tags (3)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'd recommend switching to tstats for this kind of reporting, it'll still be blazingly fast and much more flexible. For example:

| tstats latest(_time) where index=_internal by host

That'll give you the latest timestamp for each host in the _internal index. If there's an event sent by those SHs later than that 82 days ago it'll find it.

As for actually monitoring your deployment, take a look at the new distributed monitoring console that was just released together with 6.2 - awesome stuff.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I'd recommend switching to tstats for this kind of reporting, it'll still be blazingly fast and much more flexible. For example:

| tstats latest(_time) where index=_internal by host

That'll give you the latest timestamp for each host in the _internal index. If there's an event sent by those SHs later than that 82 days ago it'll find it.

As for actually monitoring your deployment, take a look at the new distributed monitoring console that was just released together with 6.2 - awesome stuff.

martin_mueller
SplunkTrust
SplunkTrust

Heh... yeah, if the time reported is wrong then it's no surprise results based on the time are wrong as well.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Maybe this is a copypasta error, but the host showing long no-rep times you posted isn't the host showing events in the second query.

0 Karma

hlarimer
Communicator

Sorry, I wrote it incorrectly in the result. NOREPP02 is the computer that is showing up in the tstats search.

0 Karma

hlarimer
Communicator

I had another thought this morning that may be affecting this. The report is ran against any universal windows forwarders which all report directly to the indexer....except the one that keeps showing up as not reporting. It is setting in our DMZ area and sends its data through a splunk forwarder. I don't fully understand tstats so could this be creating the problem?

Strangely enough it is showing its reporting, just that its always been more than a day since it last reported.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

That could be related to weird timestamping. Compare events from a "good" forwarder to events from this forwarder, particularly with respect to fields such as _time, date_*, timestartpos, timeendpos.

hlarimer
Communicator

That was exactly it!! The clock on the forwarder was set incorrectly. Thanks again for your help Martin.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Give the field a usable name:

| tstats latest(time) as latestTime where ...

You could enclose latest(_time) in single quotes in eval like this: 'latest(_time)', but that's a last resort. Rename > that.

0 Karma

hlarimer
Communicator

Martin,
Thanks for all your help. I have this functioning correctly but there is one computer that keeps showing up as not reporting recently. When I search Splunk though I see that it is reporting regularly. Here is the search:

tstats latest(_time) AS Latest where index=winevents by host | eval LastMessageTime = now() - Latest | where LastMessageTime > 1800 | eval LastMessageTime = tostring(LastMessageTime, "duration") | eval Latest =strftime(Latest,"%x %X") | sort -LastMessageTime

It returns a result for AMSPEPP02 and says its Latest is about 18 hours ago.

But when I run this search:
index=winevents host=norepp02
I see that it reported a few minutes ago.

Any ideas why that one computer keeps showing up like that?

0 Karma

hlarimer
Communicator

Thanks Martin, that gave me the correct information now but I'm having trouble using the latest(_time) in calculations. I want to create a calculations to show how long since last message and was trying to use this:

| tstats latest(_time) where index=_internal by host | eval timeSinceLastMessage = now() - latest(_time)

I get an error though because it doesn't recognize latest(_time) in the eval function. How can I use that column of data?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...