Monitoring Splunk

Please share a short Splunk preventative tasks list a Splunk Admin. would do Daily / weekly to defend the turf. Thank u

SamHTexas
Builder

Please share a Splunk preventative tasks list a Splunk Admin. would do Daily / weekly to defend the turf. Thank u in advance. Please share SPLs if you would.

Labels (1)
Tags (1)
0 Karma

SamHTexas
Builder

Thank u very much. If you think of similar measures for defensive purposes please share. Happy Memorial day 2021.

0 Karma

tscroggins
Builder

@SamHTexas 

EDIT: This list has more to do with platform stability than "defending the turf," but it's much easier to identify problems in an otherwise healthy environment than a sick one.

I generally do the following:

1. Configure the monitoring console and enable alerts. If you're using forwarders, configure forwarder monitoring. This should cover basic availability monitoring.

2. Create a report or dashboard quantifying _internal (or app specific) ERROR and WARN* events by source, component, or whichever category works best for you conceptually. Manage these as defects using quality control tools, e.g. Pareto charts.

3. Identify hosts and sources present today that were not present yesterday, i.e. new sources.

4. Identify hosts and sources present yesterday that are not present today, i.e. missing sources.

5. Identify anomalous changes in event counts across critical hosts and sources.

6. Work with your infrastructure or capacity team (if they're separate functions) to baseline Splunk performance and identify anomalous variances in principal components: CPU, memory, I/O, and storage.

Beyond the basics, you're getting into service quality and quantifying/qualifying user behavior: search performance, search coverage, data retention relative to storage pools, etc.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!