Monitoring Splunk

What's the best way to get the list of forwarders where splunkd service has stopped running?

MousumiChowdhur
Contributor

Hi!

I need to find out list of all the servers where splunkd service is not running which were running before. I have more than 9000 forwarders and have three scenarios which are listed below:

  1. splunkd is not running.
  2. splunkd is running and deployment client is set but indexer configurations are not done.
  3. splunkd is running and indexer configurations are done but deployment client is not set.

Because of the above limitations, I am finding it difficult to use queries which are based on phone home or internal logs received in Splunk as its showing up incorrect server list.

Also, I'm not allowed to use script to monitor the splunkd service on each hosts as it requires remote login.

Currently I'm using internal logs to find out the up and down forwarders but looking for a better solution.

Thank You.

vikidj
Engager

You can use this to get the last connected time and set a threshold to 60 seconds or more based on your configuration.

index=_internal |bucket _time span=1m | eval timenow=now() | convert timeformat="%b %d, %Y %H:%M:%S" mktime(_time) as LastInfoEvent mktime(timenow) AS currentTime| eval secondsSinceLastKeepAlive=(currentTime-_time) | stats min(secondsSinceLastKeepAlive) as secondsDead by host| sort secondsDead DESC

0 Karma

vr2312
Contributor

Adding to @dantimola

You can use the below three ideas and tweak it to your requirements :

No internal logs generated for UFs, you can generate an hourly search and identify the time duration since the last data was forwarded. You can use tstats command to reduce search processing

Internal Logs for Splunk can be checked and correlated with TCPOutput to see if it is failing

Internal Logs for Splunk and correlate with connections being phoned in with the DS. A UF should communicate with DS everytime a DS is restarted (this is the default parameter)

Hope you also have an asset database that would make it easier to correlate and reach out to end server admins.

0 Karma

dantimola
Communicator

Have you tried checking via Deployment Server? You can check the status of your universal forwarder in Deployment Server's Forwarder Management, have you also tried the query below?

| metadata type=hosts index=<index name> | convert ctime(*Time)

Cheers,
Dan

0 Karma