Here's an example snippet of the logs I'm working with:
2018-04-17 18:26:02 app=test-app, env=qa, total_msg=0
2018-04-17 18:25:02 app=test-app, env=qa, total_msg=60
2018-04-17 18:24:02 app=test-app, env=qa, total_msg=0
2018-04-17 18:23:02 app=test-app, env=qa, total_msg=100
2018-04-17 18:22:02 app=test-app, env=qa, total_msg=50
I'd like to create alerts and dashboard around these metrics. I've been attempting to use delta, but it's returning a negative number for my 'msg_proc' value. The query I'm using is:
index=myindex sourcetype=mymetrics environment="qa" app=test-app
| bucket _time span=1m
| stats sum(total_msg) as current by _time, app
| delta current
| rename delta(current) as msg_proc
The above query results in the output below:
_time,app,current,msg_proc
2018-04-17T14:59:00,test-app,0,0
2018-04-17T15:00:00,test-app,0,0
2018-04-17T15:01:00,test-app,42,0
2018-04-17T15:02:00,test-app,27,-15
2018-04-17T15:03:00,test-app,35,8
2018-04-17T15:04:00,test-app,21,-14
2018-04-17T15:05:00,test-app,3,-18
2018-04-17T15:06:00,test-app,1,-2
2018-04-17T15:07:00,test-app,1,0
2018-04-17T15:08:00,test-app,1,0
2018-04-17T15:09:00,test-app,1,0
As expected the delta is showing a negative number (since it's 'processing' that many messages from previous number). I'd like to send an alert when current is >0, and msgs are not being consumed within a 5 minute period (or alert if it's slow). I struggling to define good logic to alert for this condition. I'd like to create a dashboard for specific queues to show the msg_proc rate for each queue if possible. Hope this makes sense!
Not sure delta would give you anything good. For processing rate, try to use Avg function with stats or timechart.
looks like you are pulling in 60 minutes intervals, maybe try to work with streamstats
and its arguments window
or time_window
more about the command here:
http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/Streamstats