Getting Data In

What's the best way to monitor Splunk itself?

ben_leung
Builder

I would like to start a discussion as to how the community monitors their Splunk deployment? What are some of the methods you use?

How would you manage hundreds if not thousands of Splunk instances across multiple data centers? All of which can be clustered in groups/deployments.

Tags (1)
0 Karma

woodcock
Esteemed Legend

gjanders
SplunkTrust
SplunkTrust

I wrote the app Alerts For Splunk Admins for this purpose, some of the alerts are built upon the monitoring console, some are much more detailed, they cover the failure scenarios I've found in the past and any contributed by others.

0 Karma

svasani_splunk
Splunk Employee
Splunk Employee

You can take a look at new health endpoints available in Splunk v7.1
http://docs.splunk.com/Documentation/Splunk/7.1.2/DMC/Usefeaturemonitoring#Query_the_server.2Fhealth...

0 Karma

wrangler2x
Motivator

I use a product/service called Omnicenter, from Netreo. It monitors the health of ports 8000 and 8089, and sends an email alert to my group if anything is amiss there. We use this for monitoring all of our critical systems. I wrote a small script that does a test to see if my raid array is writeble, and which puts a zero or a one in a node in the snmp tree, which Omnicenter polls regularly (we had a situation where the raid got into some weird state where it was readable but not writeable).

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi ben_leung,

like any other IT system/server Splunk as well needs basic monitoring from the outside.
A good start is for sure to monitor the main splunk processes splunkd and the Splunk helper processes. this could be done by some basic script calling the $SPLUNK_HOME/bin/splunk status command.
You can check as well if the ports are up and running; simple telnet to Splunk ports will do the trick.
But also keep in mind that there could be much more involved like SAN, NFS, network and so on.

hope this helps ...

cheers, MuS

ben_leung
Builder

Does it make sense to install crontab's on 3000+ machines to watch Splunk? Automation tools like Ansible would be a great way to hit remote hosts in a massive scale. The method of how to check Splunk process is what I would hope others can share. So many tools to try out and test, looking for the perfect solution hehe.

0 Karma

MuS
SplunkTrust
SplunkTrust

Sorry to say, but nobody but you will be able to provide the perfect solution for your setup 😉

serwin
Explorer

SoS App works well.

You can make a search that checks the splunkd.log for stopped, started, etc. See below

index=_internal source=*splunkd.log host=* component=IndexProcessor ("shutting down: end" OR "Initializing: readonly")   | eval restart_status=if(message="shutting down: end","Stopping","Starting")  
0 Karma

ben_leung
Builder

Currently I have a script that just hits splunkd via REST API and checks if there is a return signal. If not, then the process or something is down. Would there be a better way to check over thousands of hosts?

0 Karma

Thomas_Aneiro
Explorer

Or you could use the Spunk on Splunk app. I would advise getting the Add-On (either windows or lInux).

https://splunkbase.splunk.com/app/748/

0 Karma

woodcock
Esteemed Legend

Splunk has finally answered this problem definitively with v6.2's new Distributed Management Console (DMC):

http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/ConfiguretheMonitoringConsole

ben_leung
Builder

What if the instance OR host that is running DMC goes down? DMC can only monitor a SHC if I am correct. We would need multiple DMC setup for multiple SHC's.

Ideally I want a tool that can monitor more than 10+ different SHC. Nodes ranging in the hundreds. Not even talking about all the forwarders 😞

Really want just the basic check of.. is splunkd running? If not, please let me know NOW. Looking for the least amount of footprint generated by a check.

0 Karma