Hey folks,
I'm trying to visualize the run times of backup processes within our Exchange environment. I monitor for the CommVault backup process via Nagios, so I have data at approximately 5 minute intervals on whether the backup process is running or not.
Standard disclaimer - I'm not real great with searches and stats, so please speak up if there's a better way to do this. I'm positive there is. 🙂
My current search is:
index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name
That gives me events like:
[SERVICEPERFDATA] 1540988387 dc2-p-xmail-03 commvault backup process 1.570 0.000 CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988685 dc2-p-xmail-03 commvault backup process 0.315 0.000 OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2
[SERVICEPERFDATA] 1540988085 dc2-p-xmail-03 commvault backup process 0.324 0.000 CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988986 dc2-p-xmail-03 commvault backup process 0.332 0.000 OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2
From these transactions, I can create a table that the Timeline visualization can deal with:
index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name
| table _time host_name duration
| sort host_name
The resulting Timeline viz looks like:
Yay, look at me go! 🙂 That is literally exactly what I'd like to see - when the backup processes were running on each Exchange host, plotted by time. It makes it much easier for our team's Exchange folks to see when backups were running across all the hosts. However, that viz was done with a "last 7 days" time period. When I zoom into the last 24 hours, the 5 minute polling interval of Nagios makes things ugly:
Sad trombone. I bet I can use streamstats to "smooth in" the events between Nagios polls, but I haven't struck upon a useful method yet. I've been through the streamstats docs a number of times, but I struggle sometimes with written documentation (I learn much better by example), so I don't think I'm "getting it".
Can someone give me a hand with this? I'm also not really convinced that 'transaction' is a good way to go, but I'm definitely a newbie when it comes to that. All assistance is greatly appreciated.
Thanks folks!
Chris
If the backup process has some ID and you have an option to log it, then you can create the transaction using transaction ProcessID startswith....
This will solve your problem.