Monitoring Splunk

How do I get started monitoring system health on Splunk Enterprise?

jmulcaster_splu
Splunk Employee
Splunk Employee

We just got Splunk Enterprise up and running, and I'd like some tips on how to tell if it's healthy. Can you get me started, and point me to some resources?

1 Solution

jmulcaster_splu
Splunk Employee
Splunk Employee

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

The Splunk Enterprise Monitoring Console is an app included with every Splunk installation. It consists of dashboards, platform alerts, and health checks. It enables Splunk administrators to gain insight into the system health of Splunk Enterprise, including indexing and search performance, OS resource usage, and license usage. But it's not just a stethoscope on system health, the information in the monitoring console provides insight about how your searches are working, and where you can tune them to make them better!

How the Monitoring Console helps promote a healthy Splunk Enterprise deployment

The monitoring console goes beyond just showing if your indexer or search heads are up or down. The monitoring console has a series of dashboards that help you find answers to common problems, for example, why users are getting "peer unresponsive" errors, or why search performance is slow. These diagnostics can also indicate where you may have inefficient searches set up, or if you have too many automated reports running that are affecting system performance.

Metrics in the Monitoring Console can also help you know when to scale. If you notice your system performance consistently running at near-peak levels even after optimizing searches, it may be time to add an indexer.

  • Search Usage Statistics: The Search Activity and Search Usage Statistics dashboards can enlighten you on details such as Aggregate Search Runtime, Top 10 Memory-Consuming Searches, and Long-Running Searches. Warning: users of these dashboards have been known to favorite the documentation on how to Write better searches...don't say we didn't warn you!
  • Scheduler activity: The Scheduler Activity dashboards monitor the activity and success rate of the search scheduler. This can help you run traffic control on your scheduled searches and ensure they are efficient and make good use of system resources.
  • License usage: The License Usage dashboard gives you insight into daily indexing volume, license warnings, and the last 30 days of your license usage directly from the Splunk Web.
  • Platform alerts: A platform alert is a saved search in the monitoring console that notifies administrators of conditions that might compromise their Splunk Enterprise environment.

Creating Alerts in Splunk

How to get started using the Splunk Cloud monitoring console

  • Set up the Monitoring Console. Monitoring Console not set up yet? Review the Monitoring Console setup for how to set it up and give it a go!
  • Enable Platform Alerts. Use these actionable alerts to stay ahead of issues that impact the platform and users.
  • Check your system health in the Monitoring Console. View the Monitoring Console and get familiar with the dashboards and the information they show. From the Overview dashboard, check the CPU usage of your indexer(s). Is it in the green (0-59%), orange (60-79%), or red (80% or more) status range? Are there any triggered alerts? From the Topology view under Indexers, toggle to show the indexing rate per second.

Beyond the Monitoring Console

There are a plethora of community created apps that take monitoring of Splunk to the next level. Take a peek at the comments of this post to learn more.

View solution in original post

jmulcaster_splu
Splunk Employee
Splunk Employee

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

The Splunk Enterprise Monitoring Console is an app included with every Splunk installation. It consists of dashboards, platform alerts, and health checks. It enables Splunk administrators to gain insight into the system health of Splunk Enterprise, including indexing and search performance, OS resource usage, and license usage. But it's not just a stethoscope on system health, the information in the monitoring console provides insight about how your searches are working, and where you can tune them to make them better!

How the Monitoring Console helps promote a healthy Splunk Enterprise deployment

The monitoring console goes beyond just showing if your indexer or search heads are up or down. The monitoring console has a series of dashboards that help you find answers to common problems, for example, why users are getting "peer unresponsive" errors, or why search performance is slow. These diagnostics can also indicate where you may have inefficient searches set up, or if you have too many automated reports running that are affecting system performance.

Metrics in the Monitoring Console can also help you know when to scale. If you notice your system performance consistently running at near-peak levels even after optimizing searches, it may be time to add an indexer.

  • Search Usage Statistics: The Search Activity and Search Usage Statistics dashboards can enlighten you on details such as Aggregate Search Runtime, Top 10 Memory-Consuming Searches, and Long-Running Searches. Warning: users of these dashboards have been known to favorite the documentation on how to Write better searches...don't say we didn't warn you!
  • Scheduler activity: The Scheduler Activity dashboards monitor the activity and success rate of the search scheduler. This can help you run traffic control on your scheduled searches and ensure they are efficient and make good use of system resources.
  • License usage: The License Usage dashboard gives you insight into daily indexing volume, license warnings, and the last 30 days of your license usage directly from the Splunk Web.
  • Platform alerts: A platform alert is a saved search in the monitoring console that notifies administrators of conditions that might compromise their Splunk Enterprise environment.

Creating Alerts in Splunk

How to get started using the Splunk Cloud monitoring console

  • Set up the Monitoring Console. Monitoring Console not set up yet? Review the Monitoring Console setup for how to set it up and give it a go!
  • Enable Platform Alerts. Use these actionable alerts to stay ahead of issues that impact the platform and users.
  • Check your system health in the Monitoring Console. View the Monitoring Console and get familiar with the dashboards and the information they show. From the Overview dashboard, check the CPU usage of your indexer(s). Is it in the green (0-59%), orange (60-79%), or red (80% or more) status range? Are there any triggered alerts? From the Topology view under Indexers, toggle to show the indexing rate per second.

Beyond the Monitoring Console

There are a plethora of community created apps that take monitoring of Splunk to the next level. Take a peek at the comments of this post to learn more.

adukes_splunk
Splunk Employee
Splunk Employee

Added related video.

0 Karma

sloshburch
Ultra Champion

I just updated the post to include proper links to the Monitoring Console and a pointer to the discussion here about other community contributed apps the y'all recommend. Karma coming your way @jacobevans

0 Karma

woodcock
Esteemed Legend
0 Karma

jacobpevans
Motivator

Howdy @jmulcaster_splunk,

This is what you're looking for: https://docs.splunk.com/Documentation/Splunk/latest/DMC/Monitoringoverview

This is also a really nice app that works as an addition to the built-in monitoring console: https://splunkbase.splunk.com/app/3796/ (Click the "details" tab for configuration instructions).

Note that neither of these will work out-of-the-box in a distributed environment. You need to go through all of the configuration items to have either working properly.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

gjanders
SplunkTrust
SplunkTrust

The github link for Alerts for Splunk Admins is here, as per the README:
"The overall idea behind this application is to provide a variety of alerts that detect issues or potential issues within the splunk log files and then advise via an alert that this has occurred This application was built as there were a variety of messages in the Splunk console and logs in Splunk that if acted upon could have prevented an issue within the environment.

There are also a few dashboards for investigating indexer performance, heavy forwarder queue usage and data model acceleration issues"

The app has expanded over the years and I would like to continue to add more to it, contributions are always welcome!

One of the apps original goals was to have most of the functionality appear in the monitoring console, over time a small number of the alerts have been replaced by monitoring console functionality!

jacobpevans
Motivator

Cheers @gjanders, I really like the app. Keep up the good work! I have it somewhere on my todo list to go through the full app and add in my own things some day.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

gjanders
SplunkTrust
SplunkTrust

No problem, if you find something that would benefit many users feel free to contribute

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...