Getting Data In

easiest way to detect if splunk forwarder is running on 150 servers

Raghav2384
Motivator

Hey There,
I am new to splunk(Please go easy on my knowledge :)). We have 150 servers that has splunk forwarders on it. We want to check the status of the forwarders(stopped/running) on a regular basis. I know there's a topic around this (check if hosts are sending any events. if no, forwarder isn't running). Big question, how can i be sure that it's forwarder problem and not the host itself? If you can provide a sample search, that'd be great!. Thank you for you time.

Regards,
Raghav

0 Karma
1 Solution

lguinn2
Legend

There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:

Query/Alert to detect if a forwarder stops reporting...

HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:

<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down

And then you could have Splunk monitor this file and use it to report the status of hosts.

View solution in original post

rameshyedurla
Explorer

try this
| metadata type=hosts
| eval lastHour=relative_time(now(),"-1h@h")
| eval yesterday=relative_time(now(), "-1d@d")
| where ( recentTime>yesterday AND recentTime

somesoni2
SplunkTrust
SplunkTrust

This is the one which we are using currently.

index=_internal source=*metrics.log group=tcpin_connections earliest=-2d@d 
| eval Host=coalesce(hostname, sourceHost)
| eval age = (now() - _time )   
| stats  min(age) as age, max(_time) as LastTime by Host   
| convert  ctime(LastTime) as "Last Active On"   
| eval  Status= case(age < 1800,"Running",age > 1800,"DOWN") | rename age as Age   
| sort Status | table Host, Status, Age, "Last Active On" 

Criteria to define if a forwarder is Running or Down is that if no heartbeat is received for 30 min its Down and running otherwise

linu1988
Champion

Hi Somesh,
I got some inconsistent result where i saw min(age) doesn't give proper timings. I have replaced it with latest i think that gives proper results. What do you think?

| stats latest(age) as age, max(_time) as LastTime by Host
| convert ctime(LastTime) as "Last Active On"
| eval Status= case(age < 100,"Running",age > 900,"DOWN") | rename age as Age |eval Hour=round(Age/3600,0)|eval Minute=round((Age%3600)/60,0)|eval Age="-".Hour."h"." : ".Minute."m"

0 Karma

linu1988
Champion

Have not you tried Deployment Monitor App? It's the easiest way to know if forwarder is running or not. Set an alert if some forwarder is stopped sending data. It also provides missing sourcetype,source, indexing status...

There you go
Splunk Deployment Monitor

Thanks

Cesaredf
Explorer

In newest version of splunk the use of Splunk Deployment Monitor has been deprecated.
Suggested is to use Splunk Deployment Monitor instead.

0 Karma

Cesaredf
Explorer
0 Karma

lguinn2
Legend

@Cesaredf - I think you mean the Distributed Management Console (DMC). In Splunk 6.3, the DMC can track forwarders and report if a forwarder goes "missing."

0 Karma

Raghav2384
Motivator

We are in the process 🙂 Thank you!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

In a nutshell, you need to search both for forwarders and for the hosts. Then you can determine if it's a host problem or a forwarder problem.

Here is the dashboard panel I use for this:

<module name="HiddenSearch" layoutPanel="panel_row5_col1" autoRun="True">
<!-- Find and report on all Splunk Universal Forwarders and endpoints not running SUF.  Skip IPs in the SUFExceptions file. -->
<param name="search"><![CDATA[index=_internal source="/opt/splunk/var/log/splunk/metrics.log*" sourcetype="splunkd" fwdType="*" | 
    dedup sourceHost | rename IPAddress AS hostip, sourceHost AS IPAddress, OS AS fOS | 
    fields IPAddress, hostname, fGUID, fOS, fwdType | append [loadjob savedsearch="my:app:HWDetailBase" |
    rename OS AS hOS | fields IPAddress, ComputerName, hOS] | 
    transaction IPAddress | 
    eval HostName=coalesce(ComputerName, hostname) | eval OS=coalesce(hOS, fOS) | 
    eval "Forwarder State"=if(isnotnull(fwdType),"Running","NOT RUNNING") |
    search [|inputlookup SUFExceptions.csv append=f| fields IPAddress |format "NOT (" "(" "" ")" "OR" ")"] |
    sort "Forwarder State" | table IPAddress, HostName, OS, "Forwarder State"
  ]]></param>
<param name="groupLabel">Forwarder Status</param>
<module name="JobProgressIndicator"></module>
<param name="earliest">-24h</param>
<param name="latest">now</param>
<module name="PostProcess" layoutPanel="panel_row5_col1">
  <param name="search"> | rename "Forwarder State" AS fState | 
                          stats count(eval(fState=="NOT RUNNING")) AS nRun</param>
  <module name="HTML" layoutPanel="panel_row5_col1">
    <param name="html"><![CDATA[
      <table>
      <tr><td>Hosts:</td><td width=3></td><td>$results.resultCount$</td><td width=8></td><td>Not running:</td><td width=3></td><td>$results[0].nRun$</td></tr>
      </table>
      ]]></param>
  </module>
</module>

The SUFExceptions.csv file contains a single field, IPAddress, and is where I put hosts I know aren't running a forwarder. It saves modifying a lengthy where clause every time there's a change to the exception list.

The HWDetailBase search is a bit too long to list here, but it essentially combines all of our sources of host information (such as port_scan) and returns IPAddress, ComputerName, and OS fields.

---
If this reply helps you, Karma would be appreciated.

Raghav2384
Motivator

Thank you Rich! will try and keep you posted. I appreciate your time and help.
Thanks,
Raghav

0 Karma

lguinn2
Legend

There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:

Query/Alert to detect if a forwarder stops reporting...

HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:

<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down

And then you could have Splunk monitor this file and use it to report the status of hosts.

Raghav2384
Motivator

It Worked!!!! Awesome

0 Karma

mmensch
Path Finder

I was just curious if you would be willing to share the script you wrote?

0 Karma

saurabh_tek
Communicator

@Raghav2384 could you please share script or suggest something on this to me and @mmensch
thanks in advance

0 Karma

Raghav2384
Motivator

Thank you Iguinn! i will try thee method you posted. I appreciate your time & help.
Thanks,
Raghav

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...