Splunk Search

After upgrading to Splunk 6.5.0, what is the best way to alert when forwarders don't check in for more than 2-3 minutes?

johnpof
Path Finder

I'm currently using a very old deployment monitor search to determine when forwarders are down and it doesn't seem to be working very well in 6.5 (false positives + non alerts). I know the Monitoring Console has some additional functionality.

Does anyone have a specific search for this? I'm hoping to alert if forwarders don't check in for 2-3 mins.

0 Karma
1 Solution

Jeremiah
Motivator

There is a forwarder dashboard in the DMC that you can enable. It has an associated alert that will notify you of missing forwarders. The dashboard will show you forwarder status (active/missing, version, data volume, etc). Note that this dashboard is strictly for forwarders, not data coming in via TCP inputs (there's another dashboard for that). The time period for a forwarder to be considered missing is 15 minutes.

http://docs.splunk.com/Documentation/Splunk/6.5.0/DMC/ForwardersDeployment

Splunk is relying on a saved search that looks at the tcpin metrics reported by your indexers to build this dashboard and report on missing forwarders. If you have a lot of forwarders this search can put a pretty heavy load on your indexers (the setup page in the DMC also warns about this). It's also relying on the forwarder guid to uniquely identify your forwarder. So if you reimage a host but retain the hostname, or reinstall the forwarder for some reason, a forwarder will appear to be missing when its actually not. To clear up the missing forwarders, you'll need to periodically rebuild the forwarder asset data in the DMC (its just a button click).

Beyond this, if you want to get to the right search you need to consider how many forwarders you have, the amount of change in your environment, and what you really want to monitor for (ie, missing forwarders or missing data).

View solution in original post

reedmohn
Communicator

I'm not sure what you mean when you say you "tried grabbing the search that the DMC uses"

The DMC alert is disabled by default. Did you try enabling it?

0 Karma

Jeremiah
Motivator

There is a forwarder dashboard in the DMC that you can enable. It has an associated alert that will notify you of missing forwarders. The dashboard will show you forwarder status (active/missing, version, data volume, etc). Note that this dashboard is strictly for forwarders, not data coming in via TCP inputs (there's another dashboard for that). The time period for a forwarder to be considered missing is 15 minutes.

http://docs.splunk.com/Documentation/Splunk/6.5.0/DMC/ForwardersDeployment

Splunk is relying on a saved search that looks at the tcpin metrics reported by your indexers to build this dashboard and report on missing forwarders. If you have a lot of forwarders this search can put a pretty heavy load on your indexers (the setup page in the DMC also warns about this). It's also relying on the forwarder guid to uniquely identify your forwarder. So if you reimage a host but retain the hostname, or reinstall the forwarder for some reason, a forwarder will appear to be missing when its actually not. To clear up the missing forwarders, you'll need to periodically rebuild the forwarder asset data in the DMC (its just a button click).

Beyond this, if you want to get to the right search you need to consider how many forwarders you have, the amount of change in your environment, and what you really want to monitor for (ie, missing forwarders or missing data).

johnpof
Path Finder

Thanks for the lengthy reply, that all makes sense from a monitoring perspective and I've done a solid amount of research on that side of it.

Specifically though I was looking for a best practice way of being alerted when forwarders are down/missing. I've tried grabbing the search that the DMC uses but I've had no luck.

Spent a lot of time googling before posting this but every search I've tried based on my findings has not worked as intended.

0 Karma

Jeremiah
Motivator

If you've got details about what you've tried, I'd love to know what and why it hasn't worked for you.

0 Karma

gjanders
SplunkTrust
SplunkTrust

There is an app called "broken hosts" which might help here

If not something like this might work:

| metadata type=hosts index=_internal | eval age=now()-recentTime | eval status=if(age<1200,"UP","DOWN") | convert ctime(recentTime) as "Last Active On" | rename age as Age |eval Hour=round(Age/3600,0)|eval Minute=round((Age%3600)/60,0)|eval Age="-".Hour."h"." : ".Minute."m" |table host, status, "Last Active On", Age | search status=DOWN | lookup dnslookup clienthost AS host | search clientip!=''

Note I'm using the DNS lookup because we de-register DNS entries when a host is decommed, otherwise just remove the lookup...

johnpof
Path Finder

I've tried a lot of searches including ones with | metadata and they all had weirdness, this one actually looks really accurate/promising I think I can make it work.

Thanks guys!

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...