Splunk metrics and counters

peiffer · ‎11-09-2020

What is the appropriate way to calculate a sum of metric rates on counters and sum them, either for a single stat or for a timechart? What does the rate() of a metric mean? rate/sample or rate/second? I am looking for guidance.

I am extracting bind9 stats from our dozen DNS recursive servers every 5 minutes. The stats are counters. I am extracting the stats every 10 minutes so that I can get 2 samples each for rate calculations.

Base search:
| mstats rate(QrySuccess) as QrySuccess rate(QryFailure) as QryFailure rate(QrySERVFAIL) as QrySERVFAIL rate(QryFORMERR) as QryFORMERR
rate(QryNXDOMAIN) as QryNXDOMAIN rate(QryRecursion) as QryRecursion
prestats=false WHERE index="test_network_metrics" AND host="*" span=10m by host
| fields *

SingleStat Panel
| fields QrySuccess
| eval Success=QrySuccess/300
| stats sum(Success)

Timechart Panel
| fields QrySuccess host
| timechart span=10m latest(QrySuccess) as Success by host

Screen Shot 2020-11-02 at 6.24.55 AM.png

The numbers don't exactly look right as at peak I am expecting traffic on the order of thousands per second. I am thinking that I botched the stats. System wide, I am running about 14M qph or about 3900 qps. If I leave off the division by 300 convert 5min to 1sec, it looks closer to normal, or about 30% of what I am expecting. Below is what I get from processing hourly summaries of DNS query transaction logs.

Screen Shot 2020-11-02 at 6.06.41 AM.png

Screen Shot 2020-11-02 at 6.06.41 AM.png

I experimented with summing the latest on the target field, but the numbers come out about the same.

Splunk metrics and counters

other

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?