All Apps and Add-ons

Using Timeline visualization to show process runtimes - filling in gaps between events

bensec01
Explorer

Hey folks,

I'm trying to visualize the run times of backup processes within our Exchange environment. I monitor for the CommVault backup process via Nagios, so I have data at approximately 5 minute intervals on whether the backup process is running or not.

Standard disclaimer - I'm not real great with searches and stats, so please speak up if there's a better way to do this. I'm positive there is. 🙂

My current search is:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name

That gives me events like:

[SERVICEPERFDATA] 1540988387    dc2-p-xmail-03  commvault backup process    1.570   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988685    dc2-p-xmail-03  commvault backup process    0.315   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

[SERVICEPERFDATA] 1540988085    dc2-p-xmail-03  commvault backup process    0.324   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988986    dc2-p-xmail-03  commvault backup process    0.332   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

From these transactions, I can create a table that the Timeline visualization can deal with:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name
| table _time host_name duration
| sort host_name

The resulting Timeline viz looks like:

alt text

Yay, look at me go! 🙂 That is literally exactly what I'd like to see - when the backup processes were running on each Exchange host, plotted by time. It makes it much easier for our team's Exchange folks to see when backups were running across all the hosts. However, that viz was done with a "last 7 days" time period. When I zoom into the last 24 hours, the 5 minute polling interval of Nagios makes things ugly:

alt text

Sad trombone. I bet I can use streamstats to "smooth in" the events between Nagios polls, but I haven't struck upon a useful method yet. I've been through the streamstats docs a number of times, but I struggle sometimes with written documentation (I learn much better by example), so I don't think I'm "getting it".

Can someone give me a hand with this? I'm also not really convinced that 'transaction' is a good way to go, but I'm definitely a newbie when it comes to that. All assistance is greatly appreciated.

Thanks folks!

Chris

0 Karma

rapmancz
Explorer

If the backup process has some ID and you have an option to log it, then you can create the transaction using transaction ProcessID startswith....
This will solve your problem.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...