Getting Data In

easiest way to detect if splunk forwarder is running on 150 servers

Motivator

Hey There,
I am new to splunk(Please go easy on my knowledge :)). We have 150 servers that has splunk forwarders on it. We want to check the status of the forwarders(stopped/running) on a regular basis. I know there's a topic around this (check if hosts are sending any events. if no, forwarder isn't running). Big question, how can i be sure that it's forwarder problem and not the host itself? If you can provide a sample search, that'd be great!. Thank you for you time.

Regards,
Raghav

0 Karma
1 Solution

Legend

There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:

Query/Alert to detect if a forwarder stops reporting...

HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:

<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down

And then you could have Splunk monitor this file and use it to report the status of hosts.

View solution in original post

Explorer

try this
| metadata type=hosts
| eval lastHour=relative_time(now(),"-1h@h")
| eval yesterday=relative_time(now(), "-1d@d")
| where ( recentTime>yesterday AND recentTime

Revered Legend

This is the one which we are using currently.

index=_internal source=*metrics.log group=tcpin_connections earliest=-2d@d 
| eval Host=coalesce(hostname, sourceHost)
| eval age = (now() - _time )   
| stats  min(age) as age, max(_time) as LastTime by Host   
| convert  ctime(LastTime) as "Last Active On"   
| eval  Status= case(age < 1800,"Running",age > 1800,"DOWN") | rename age as Age   
| sort Status | table Host, Status, Age, "Last Active On" 

Criteria to define if a forwarder is Running or Down is that if no heartbeat is received for 30 min its Down and running otherwise

Champion

Hi Somesh,
I got some inconsistent result where i saw min(age) doesn't give proper timings. I have replaced it with latest i think that gives proper results. What do you think?

| stats latest(age) as age, max(_time) as LastTime by Host
| convert ctime(LastTime) as "Last Active On"
| eval Status= case(age < 100,"Running",age > 900,"DOWN") | rename age as Age |eval Hour=round(Age/3600,0)|eval Minute=round((Age%3600)/60,0)|eval Age="-".Hour."h"." : ".Minute."m"

0 Karma

Champion

Have not you tried Deployment Monitor App? It's the easiest way to know if forwarder is running or not. Set an alert if some forwarder is stopped sending data. It also provides missing sourcetype,source, indexing status...

There you go
Splunk Deployment Monitor

Thanks

Explorer

In newest version of splunk the use of Splunk Deployment Monitor has been deprecated.
Suggested is to use Splunk Deployment Monitor instead.

0 Karma

Explorer
0 Karma

Legend

@Cesaredf - I think you mean the Distributed Management Console (DMC). In Splunk 6.3, the DMC can track forwarders and report if a forwarder goes "missing."

0 Karma

Motivator

We are in the process 🙂 Thank you!

0 Karma

SplunkTrust
SplunkTrust

In a nutshell, you need to search both for forwarders and for the hosts. Then you can determine if it's a host problem or a forwarder problem.

Here is the dashboard panel I use for this:

<module name="HiddenSearch" layoutPanel="panel_row5_col1" autoRun="True">
<!-- Find and report on all Splunk Universal Forwarders and endpoints not running SUF.  Skip IPs in the SUFExceptions file. -->
<param name="search"><![CDATA[index=_internal source="/opt/splunk/var/log/splunk/metrics.log*" sourcetype="splunkd" fwdType="*" | 
    dedup sourceHost | rename IPAddress AS hostip, sourceHost AS IPAddress, OS AS fOS | 
    fields IPAddress, hostname, fGUID, fOS, fwdType | append [loadjob savedsearch="my:app:HWDetailBase" |
    rename OS AS hOS | fields IPAddress, ComputerName, hOS] | 
    transaction IPAddress | 
    eval HostName=coalesce(ComputerName, hostname) | eval OS=coalesce(hOS, fOS) | 
    eval "Forwarder State"=if(isnotnull(fwdType),"Running","NOT RUNNING") |
    search [|inputlookup SUFExceptions.csv append=f| fields IPAddress |format "NOT (" "(" "" ")" "OR" ")"] |
    sort "Forwarder State" | table IPAddress, HostName, OS, "Forwarder State"
  ]]></param>
<param name="groupLabel">Forwarder Status</param>
<module name="JobProgressIndicator"></module>
<param name="earliest">-24h</param>
<param name="latest">now</param>
<module name="PostProcess" layoutPanel="panel_row5_col1">
  <param name="search"> | rename "Forwarder State" AS fState | 
                          stats count(eval(fState=="NOT RUNNING")) AS nRun</param>
  <module name="HTML" layoutPanel="panel_row5_col1">
    <param name="html"><![CDATA[
      <table>
      <tr><td>Hosts:</td><td width=3></td><td>$results.resultCount$</td><td width=8></td><td>Not running:</td><td width=3></td><td>$results[0].nRun$</td></tr>
      </table>
      ]]></param>
  </module>
</module>

The SUFExceptions.csv file contains a single field, IPAddress, and is where I put hosts I know aren't running a forwarder. It saves modifying a lengthy where clause every time there's a change to the exception list.

The HWDetailBase search is a bit too long to list here, but it essentially combines all of our sources of host information (such as port_scan) and returns IPAddress, ComputerName, and OS fields.

---
If this reply helps you, an upvote would be appreciated.

Motivator

Thank you Rich! will try and keep you posted. I appreciate your time and help.
Thanks,
Raghav

0 Karma

Legend

There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:

Query/Alert to detect if a forwarder stops reporting...

HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:

<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down

And then you could have Splunk monitor this file and use it to report the status of hosts.

View solution in original post

Motivator

It Worked!!!! Awesome

0 Karma

Path Finder

I was just curious if you would be willing to share the script you wrote?

0 Karma

Communicator

@Raghav2384 could you please share script or suggest something on this to me and @mmensch
thanks in advance

0 Karma

Motivator

Thank you Iguinn! i will try thee method you posted. I appreciate your time & help.
Thanks,
Raghav

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!