Getting Data In

How to create an alert for a forwarder that hasn't checked in within the last 5 minutes in Splunk 6.3.0?

rfiscus
Path Finder

I was under the impression that forwarders send a heart beat back to the indexers. How can I create an alert for when a forwarder that hasn't checked in within the last 5 minutes per example?

0 Karma

rfiscus
Path Finder

Ok, way simpler just to enable email on the DMC and clone the stock DMC Alert - Missing Forwarders and then Advanced edit that. I have awarded points to both people for their efforts and to get me on the right track.

| inputlookup dmc_forwarder_assets
| search status="missing" 
| eval status = "Not Reachable"
| eval "Last Connected" = strftime(last_connected,"%m-%d-%Y %H:%M:%S")
| rename status as Status
| rename os as OS
| rename hostname as "Source Host"
| table "Source Host" "Last Connected" OS Status
0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Let's take apart the DMC Missing forwarders alert in Splunk 6.3!

It's called DMC Alert - Missing forwarders, with contents:

| inputlookup dmc_forwarder_assets
| search status="missing" 
| rename hostname as Instance

So we need to figure out how those dmc_forwarder_assets are created, that is, via a macro called dmc_build_forwarder_assets(1):

`dmc_set_index_internal` sourcetype=splunkd group=tcpin_connections NOT eventType=* 
| stats 
values(fwdType) as forwarder_type, 
latest(version) as version, 
values(arch) as arch, 
values(os) as os, 
max(_time) as last_connected, 
sum(kb) as new_sum_kb, 
sparkline(avg(tcp_KBps), $sparkline_span$) as new_avg_tcp_kbps_sparkline, 
avg(tcp_KBps) as new_avg_tcp_kbps, 
avg(tcp_eps) as new_avg_tcp_eps 
by guid, hostname

which also includes a second macro dmc_set_index_internal, which is simply:

index=_internal

Then we have one last macro called dmc_re_build_forwarder_assets(1) which is the essence of the dmc_forwarder_assets lookup:

`dmc_build_forwarder_assets($sparkline_span$)` 
| rename new_sum_kb as sum_kb, new_avg_tcp_kbps_sparkline as avg_tcp_kbps_sparkline, new_avg_tcp_kbps as avg_tcp_kbps, new_avg_tcp_eps as avg_tcp_eps 
| eval avg_tcp_kbps_sparkline = "N/A" 
| addinfo 
| eval status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 900)), "missing", "active") 
| eval sum_kb = round(sum_kb, 2) 
| eval avg_tcp_kbps = round(avg_tcp_kbps, 2) 
| eval avg_tcp_eps = round(avg_tcp_eps, 2) 
| fields guid, hostname, forwarder_type, version, arch, os, status, last_connected, sum_kb, avg_tcp_kbps_sparkline, avg_tcp_kbps, avg_tcp_eps 
| outputlookup dmc_forwarder_assets

TLDR;

So what is the real take-away here? Let's take out the relevant parts:

index=_internal sourcetype=splunkd group=tcpin_connections NOT eventType=* 
| stats
max(_time) as last_connected,
sum(kb) as sum_kb by guid, hostname
| addinfo
| eval status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 900)), "missing", "active") 
| where status="missing"

where we can tweak the value 900 (15 minutes) to whatever we want.

claudio_manig
Communicator

Thank you!

0 Karma

rfiscus
Path Finder

Is there a way for this search to keep showing up if I set the Relative Time to 10 minutes (for alerting)? So a host connection time might have been 1 hour ago which would work with the info_max_time -60 but not within the Last 10 minutes to the relative search. I would need it to keep showing up when running the search above.

index= _internal sourcetype = splunkd group = tcpin_connections NOT eventType = * 
| stats  max(_time) as last_connected, sum(kb) as sum_kb by guid, hostname
| addinfo
| eval "Source Host" = hostname
| eval ttnow = now()
| eval Current = strftime(ttnow,"%m-%d-%Y %H:%M:%S")
| eval Status = if(isnull(sum_kb) or (sum_kb <= 0) or (last_connected < (info_max_time - 60)), "Not Reachable", "active") 
| eval "Last Connected" = strftime(last_connected,"%m-%d-%Y %H:%M:%S")
| where Status = "Not Reachable"
| table "Source Host" "Last Connected" Current Status
0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

why can't you just change 900 to 600 and schedule the alert to run every ten minutes? Sorry, I'm not sure what you mean all the way.

0 Karma

rfiscus
Path Finder

Say the computer went offline at 1300 hrs, anytime between 1300 and 1310 it will show up in the result set and the alert triggers fine. After 1320 when the alert runs, it won't show up because the query for last connected was in the previous 10 minutes not in the current ten minutes. If I change the alert for a 60 minute window it will work until that 10 minute search falls out of the 60 minute time window. Meaning we should keep getting the alerts until it stops saying "missing". I hope that wasn't too confusing??

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

@rfiscus - I'm not positive I'm following, but if I understand, it should still show up after 1320 if it is still disconnected because in the Status field you have multiple conditions:

  • isnull(sum_kb)
  • sum_kb <= 0
  • last_connected < (info_max_time - 60)

So if it is really disconnected, the sum_kb field should be zero and the forwarder should show up again on the next scheduled run of the alert / search.

Ultimately, your comment about the 60 minute window versus the 10 minute window - unless you want to embed a bunch of smaller versions of this search within itself and do some _time bucketing, you'll only have the granularity of your earliest and latest in the main search. So if the search above ran every hour, and you set the logic to last_connected < (info_max_time - 3600) - then you would only have the granularity of one hour - that is, if a forwarder went offline at 13:27, and your alert runs every hour on the hour, then you'd have to wait until 14:00 for your alert to run again and find out that a forwarder went offline. If you wanted to know exactly when machines were going off and on, maybe you could explore changing stats to timechart so you can see those finer time-slices

0 Karma

DMohn
Motivator

You could use the altering feature included in the Distributed Management Console, and define an alert for the forwarders there.

Check http://docs.splunk.com/Documentation/Splunk/6.2.0/Admin/Platformalerts for details.

rfiscus
Path Finder

I agree and that works ok, how can I get these events into the indexers so I can query them via the search head? We do not allow emailing from our DMC.

0 Karma

DMohn
Motivator

If you need them to show up in the splunk index, you could define a script as altering action, and parse the script output back to splunk again.

But why would you need that? you can use the alert manager as well to monitor this: http://docs.splunk.com/Documentation/Splunk/6.2.0/Alert/Setupalertactions

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...