I'm trying to monitor any sudden drops/increases into my Weblogic queue. I can get a search easy enough to visualise it - I'm just having a hard time formatting it to something I can alert off of.
Here's the visual search:
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference
I thought maybe adding some eval at the end would work - which it kind of does. I do get a percentage, I'm just not sure what I can do next. I'd like the alert to trigger if there is a 50% change (positive/negative).
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=(difference/count)*100 | eval percdif=round(percdif,0)
Any help would be appreciated.
I'd simplify your statement a touch:
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=round(abs(difference/count)*100,0)
So you can then alert on if percdif > 50.
Without knowing your data, though (and knowing that this may be very obvious to you already), note that the above will alert on any sudden drops / increases into the number of times that message is logged, which will not necessarily equal your queue length. If that full message contains a QueueLength field, or anything like that, you might get more useful information by going for that field:
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart last(QueueLength) as CurrentQueueLength span=10m | delta CurrentQueueLength as difference | eval percdif=round(abs(difference/CurrentQueueLength)*100,0)
I'd simplify your statement a touch:
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=round(abs(difference/count)*100,0)
So you can then alert on if percdif > 50.
Without knowing your data, though (and knowing that this may be very obvious to you already), note that the above will alert on any sudden drops / increases into the number of times that message is logged, which will not necessarily equal your queue length. If that full message contains a QueueLength field, or anything like that, you might get more useful information by going for that field:
host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart last(QueueLength) as CurrentQueueLength span=10m | delta CurrentQueueLength as difference | eval percdif=round(abs(difference/CurrentQueueLength)*100,0)
Splunk put up a page with all the functions that are available in eval. It is quite helpful: http://www.splunk.com/base/Documentation/latest/SearchReference/CommonEvalFunctions
Very nice, thanks, David. I didn't realize the the abs() existed.
Hmm,I think I have it. Maybe I could get a spot check? host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=(difference/count)*100 | eval percdif=round(percdif,0) | where percdif < -50 OR percdif > 50 :: Then I schedule a job every 10 minutes?