TLDR;

rfiscus · ‎11-11-2015

I was under the impression that forwarders send a heart beat back to the indexers. How can I create an alert for when a forwarder that hasn't checked in within the last 5 minutes per example?

rfiscus · ‎11-11-2015

Ok, way simpler just to enable email on the DMC and clone the stock DMC Alert - Missing Forwarders and then Advanced edit that. I have awarded points to both people for their efforts and to get me on the right track.

| inputlookup dmc_forwarder_assets
| search status="missing" 
| eval status = "Not Reachable"
| eval "Last Connected" = strftime(last_connected,"%m-%d-%Y %H:%M:%S")
| rename status as Status
| rename os as OS
| rename hostname as "Source Host"
| table "Source Host" "Last Connected" OS Status

aljohnson_splun · ‎11-11-2015

Let's take apart the DMC Missing forwarders alert in Splunk 6.3!

It's called DMC Alert - Missing forwarders, with contents:

| inputlookup dmc_forwarder_assets
| search status="missing" 
| rename hostname as Instance

So we need to figure out how those dmc_forwarder_assets are created, that is, via a macro called dmc_build_forwarder_assets(1):

`dmc_set_index_internal` sourcetype=splunkd group=tcpin_connections NOT eventType=* 
| stats 
values(fwdType) as forwarder_type, 
latest(version) as version, 
values(arch) as arch, 
values(os) as os, 
max(_time) as last_connected, 
sum(kb) as new_sum_kb, 
sparkline(avg(tcp_KBps), $sparkline_span$) as new_avg_tcp_kbps_sparkline, 
avg(tcp_KBps) as new_avg_tcp_kbps, 
avg(tcp_eps) as new_avg_tcp_eps 
by guid, hostname

which also includes a second macro dmc_set_index_internal, which is simply:

index=_internal

Then we have one last macro called dmc_re_build_forwarder_assets(1) which is the essence of the dmc_forwarder_assets lookup:

`dmc_build_forwarder_assets($sparkline_span$)` 
| rename new_sum_kb as sum_kb, new_avg_tcp_kbps_sparkline as avg_tcp_kbps_sparkline, new_avg_tcp_kbps as avg_tcp_kbps, new_avg_tcp_eps as avg_tcp_eps 
| eval avg_tcp_kbps_sparkline = "N/A" 
| addinfo 
| eval status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 900)), "missing", "active") 
| eval sum_kb = round(sum_kb, 2) 
| eval avg_tcp_kbps = round(avg_tcp_kbps, 2) 
| eval avg_tcp_eps = round(avg_tcp_eps, 2) 
| fields guid, hostname, forwarder_type, version, arch, os, status, last_connected, sum_kb, avg_tcp_kbps_sparkline, avg_tcp_kbps, avg_tcp_eps 
| outputlookup dmc_forwarder_assets

TLDR;

So what is the real take-away here? Let's take out the relevant parts:

index=_internal sourcetype=splunkd group=tcpin_connections NOT eventType=* 
| stats
max(_time) as last_connected,
sum(kb) as sum_kb by guid, hostname
| addinfo
| eval status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 900)), "missing", "active") 
| where status="missing"

where we can tweak the value 900 (15 minutes) to whatever we want.

claudio_manig · ‎08-05-2016

Thank you!

rfiscus · ‎11-11-2015

Is there a way for this search to keep showing up if I set the Relative Time to 10 minutes (for alerting)? So a host connection time might have been 1 hour ago which would work with the info_max_time -60 but not within the Last 10 minutes to the relative search. I would need it to keep showing up when running the search above.

index= _internal sourcetype = splunkd group = tcpin_connections NOT eventType = * 
| stats  max(_time) as last_connected, sum(kb) as sum_kb by guid, hostname
| addinfo
| eval "Source Host" = hostname
| eval ttnow = now()
| eval Current = strftime(ttnow,"%m-%d-%Y %H:%M:%S")
| eval Status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 60)), "Not Reachable", "active") 
| eval "Last Connected" = strftime(last_connected,"%m-%d-%Y %H:%M:%S")
| where Status = "Not Reachable"
| table "Source Host" "Last Connected" Current Status

aljohnson_splun · ‎11-11-2015

why can't you just change 900 to 600 and schedule the alert to run every ten minutes? Sorry, I'm not sure what you mean all the way.

rfiscus · ‎11-11-2015

Say the computer went offline at 1300 hrs, anytime between 1300 and 1310 it will show up in the result set and the alert triggers fine. After 1320 when the alert runs, it won't show up because the query for last connected was in the previous 10 minutes not in the current ten minutes. If I change the alert for a 60 minute window it will work until that 10 minute search falls out of the 60 minute time window. Meaning we should keep getting the alerts until it stops saying "missing". I hope that wasn't too confusing??

aljohnson_splun · ‎11-13-2015

@rfiscus - I'm not positive I'm following, but if I understand, it should still show up after 1320 if it is still disconnected because in the Status field you have multiple conditions:

isnull(sum_kb)
sum_kb <= 0
last_connected < (info_max_time - 60)

So if it is really disconnected, the sum_kb field should be zero and the forwarder should show up again on the next scheduled run of the alert / search.

Ultimately, your comment about the 60 minute window versus the 10 minute window - unless you want to embed a bunch of smaller versions of this search within itself and do some _time bucketing, you'll only have the granularity of your earliest and latest in the main search. So if the search above ran every hour, and you set the logic to last_connected < (info_max_time - 3600) - then you would only have the granularity of one hour - that is, if a forwarder went offline at 13:27, and your alert runs every hour on the hour, then you'd have to wait until 14:00 for your alert to run again and find out that a forwarder went offline. If you wanted to know exactly when machines were going off and on, maybe you could explore changing stats to timechart so you can see those finer time-slices

DMohn · ‎11-11-2015

You could use the altering feature included in the Distributed Management Console, and define an alert for the forwarders there.

Check http://docs.splunk.com/Documentation/Splunk/6.2.0/Admin/Platformalerts for details.

rfiscus · ‎11-11-2015

I agree and that works ok, how can I get these events into the indexers so I can query them via the search head? We do not allow emailing from our DMC.

DMohn · ‎11-11-2015

If you need them to show up in the splunk index, you could define a script as altering action, and parse the script output back to splunk again.

But why would you need that? you can use the alert manager as well to monitor this: http://docs.splunk.com/Documentation/Splunk/6.2.0/Alert/Setupalertactions

How to create an alert for a forwarder that hasn't checked in within the last 5 minutes in Splunk 6.3.0?

TLDR;

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!