Splunk metrics and counters


What is the appropriate way to calculate a sum of metric rates on counters and sum them, either for a single stat or for a timechart?  What does the rate() of a metric mean?  rate/sample or rate/second?   I am looking for guidance.

I am extracting bind9 stats from our dozen DNS recursive servers every 5 minutes.  The stats are counters. I am extracting the stats every 10 minutes so that I can get 2 samples each for rate calculations.

Base search:
| mstats rate(QrySuccess) as QrySuccess rate(QryFailure) as QryFailure rate(QrySERVFAIL) as QrySERVFAIL rate(QryFORMERR) as QryFORMERR
    rate(QryNXDOMAIN) as QryNXDOMAIN rate(QryRecursion) as QryRecursion
    prestats=false  WHERE index="test_network_metrics" AND host="*" span=10m by host
| fields *

SingleStat Panel
| fields QrySuccess
| eval Success=QrySuccess/300
| stats sum(Success)
Timechart Panel
| fields QrySuccess host
| timechart span=10m latest(QrySuccess) as Success by host
Screen Shot 2020-11-02 at 6.24.55 AM.png

The numbers don't exactly look right as at peak I am expecting traffic on the order of thousands per second.  I am thinking that I botched the stats.  System wide, I am running about 14M qph or about 3900 qps.  If I leave off the division by 300 convert 5min to 1sec, it looks closer to normal, or about 30% of what I am expecting.  Below is what I get from processing hourly summaries of DNS query transaction logs.

Screen Shot 2020-11-02 at 6.06.41 AM.png


I experimented with summing the latest on the target field, but the numbers come out about the same.
| fields QrySuccess host
| fillnull value=0.0 QrySuccess
| stats latest(QrySuccess) as Success by host
| addcoltotals labelfield=host fieldname=Success
| tail 1
| fields Success

Labels (1)
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!