Re: What's the best way to monitor Splunk itself?

ben_leung · ‎07-07-2015

I would like to start a discussion as to how the community monitors their Splunk deployment? What are some of the methods you use?

How would you manage hundreds if not thousands of Splunk instances across multiple data centers? All of which can be clustered in groups/deployments.

woodcock · ‎09-04-2018

Try this GREAT app:
https://splunkbase.splunk.com/app/3796/

From this session:
https://conf.splunk.com/files/2017/slides/howd-you-get-so-big-tips-tricks-for-growing-your-splunk-de...

gjanders · ‎09-04-2018

I wrote the app Alerts For Splunk Admins for this purpose, some of the alerts are built upon the monitoring console, some are much more detailed, they cover the failure scenarios I've found in the past and any contributed by others.

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

svasani_splunk · ‎09-04-2018

You can take a look at new health endpoints available in Splunk v7.1
http://docs.splunk.com/Documentation/Splunk/7.1.2/DMC/Usefeaturemonitoring#Query_the_server.2Fhealth...

wrangler2x · ‎07-09-2015

I use a product/service called Omnicenter, from Netreo. It monitors the health of ports 8000 and 8089, and sends an email alert to my group if anything is amiss there. We use this for monitoring all of our critical systems. I wrote a small script that does a test to see if my raid array is writeble, and which puts a zero or a one in a node in the snmp tree, which Omnicenter polls regularly (we had a situation where the raid got into some weird state where it was readable but not writeable).

MuS · ‎07-07-2015

Hi ben_leung,

like any other IT system/server Splunk as well needs basic monitoring from the outside.
A good start is for sure to monitor the main splunk processes splunkd and the Splunk helper processes. this could be done by some basic script calling the $SPLUNK_HOME/bin/splunk status command.
You can check as well if the ports are up and running; simple telnet to Splunk ports will do the trick.
But also keep in mind that there could be much more involved like SAN, NFS, network and so on.

hope this helps ...

cheers, MuS

ben_leung · ‎07-07-2015

Does it make sense to install crontab's on 3000+ machines to watch Splunk? Automation tools like Ansible would be a great way to hit remote hosts in a massive scale. The method of how to check Splunk process is what I would hope others can share. So many tools to try out and test, looking for the perfect solution hehe.

MuS · ‎07-07-2015

Sorry to say, but nobody but you will be able to provide the perfect solution for your setup 😉

serwin · ‎07-07-2015

SoS App works well.

You can make a search that checks the splunkd.log for stopped, started, etc. See below

index=_internal source=*splunkd.log host=* component=IndexProcessor ("shutting down: end" OR "Initializing: readonly")   | eval restart_status=if(message="shutting down: end","Stopping","Starting")

ben_leung · ‎07-07-2015

Currently I have a script that just hits splunkd via REST API and checks if there is a return signal. If not, then the process or something is down. Would there be a better way to check over thousands of hosts?

Thomas_Aneiro · ‎07-07-2015

Or you could use the Spunk on Splunk app. I would advise getting the Add-On (either windows or lInux).

https://splunkbase.splunk.com/app/748/

woodcock · ‎07-07-2015

Splunk has finally answered this problem definitively with v6.2's new Distributed Management Console (DMC):

http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/ConfiguretheMonitoringConsole

ben_leung · ‎07-07-2015

What if the instance OR host that is running DMC goes down? DMC can only monitor a SHC if I am correct. We would need multiple DMC setup for multiple SHC's.

Ideally I want a tool that can monitor more than 10+ different SHC. Nodes ranging in the hundreds. Not even talking about all the forwarders 😞

Really want just the basic check of.. is splunkd running? If not, please let me know NOW. Looking for the least amount of footprint generated by a check.

What's the best way to monitor Splunk itself?

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor