Monitoring Splunk

Looking at Queue Fill ratios in DMC - which aggregation?

rpquinlan
Path Finder

What's the general consensus / best practice when looking in the DMC --> Indexing --> Indexing Performance: Instance, looking at the "Fill Ratio of Data Processing Queues" - Which aggregation is the "best" to use? I don't get alerts about any queues being filled.

Using the default of 'median', everything looks great, all flat-lined.

Using 90th Percentile (as suggested from my first call to support), I can see a few blips on the indexing queue, but nothing major:
90th Percentile

Using "Maximum", there DEFINITELY appears to be an issue:

Maximum

I am looking into potential SAN issues, but these are running on a lightly loaded host, fiber-channel connected to an EMC "XtremeIO" all-flash array. I can't imagine there's really an IOPS problem, but it could be something on the host/guest. We don't have any TCP/syslog going out from the indexers - it's just write to disk. But anyway, this is more about which view is 'best' to use...

0 Karma

eregon
Path Finder

There is no general consensus / best practice on what to use, it depends on what you want to find out. To choose the aggregation properly, you need to understand what it means. Actually, it is just pure maths.

Splunk has fill ratio values on per minute basis (or maybe per every few seconds, I am not sure about that), however the graph presents them aggregated. That means several values in Splunk logs (all values in certain time window, that means per 5min, per 1h, per 1d, ...) are aggregated into one single value presented to user in graph.

In another words, if you choose to display maximum, you will get the upper limit: you know the queue fill ratio did not exceed this value during the respective timeframe. It could be useful, let's say, to prove your hardware is such an overkill that your queues can never ever get full.

To check you have no I/O trouble, average/median/90percentile are much more appropriate.

0 Karma

davpx
Communicator

90th percentile is what I usually use. Max is pretty misleading

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...