Alerting

Uptime percentage using IBM Impact logs

viksinha
Explorer

Our monitoring tool keeps sending Down alert after a specific time interval if the application goes down. Once it comes up, the tool sends only 1 UP alert. So we get multiple Down events and single UP event, say for an outage. So output would be like:

UP 2014-05-13 08:08

Down 2014-05-13 07:58

Down 2014-05-13 07:48

Down 2014-05-13 07:38

Down 2014-05-13 07:28

UP 2014-05-12 16:08 #old outage UP event

Down 2014-05-12 15:58
.
.
.

Is there any way, we can find uptime using this data. Apps like pinger can also find Uptime, but we need to extract other fields that we are getting in our data source for our further reports along with uptime.

So far the logic that we can apply is that a loop checks every record in the output, when it finds an UP event, its time is picked up and the time of next Down event. the difference of those can be added in a variable incrementally for such UP-Down events occurrences.

Thanks in advance!

Tags (2)
0 Karma

MuS
SplunkTrust
SplunkTrust

Hi viksinha,

I like using streamstats for this kind of use case. But first you will need to have some field containing the up or down message, so let us assume this field will be called status. So you could use something like this:

your base search to get the event here | streamstats current=f last(status) as last_status last(_time) as last_time | where last_status="Down" AND status="UP" | eval downtime=_time-last_time | convert timeformat="%H:%M:%S.%3N" ctime(downtime) | table status downtime

hope this helps to get you started ...

cheers, MuS

viksinha
Explorer

Finally saved by "transaction" command, with only "endswith" option

sourcetype=IBM_Logs "Status=DOWN" OR "Status=UP" URL=http://video.intranet.com/ | convert timeformat="%F %H:%M:%S" ctime(_time) AS ReportTime |dedup ReportTime | transaction URL endswith="UP" | sort -ReportTime | rename duration AS Downtime_in_Sec | eval Downtime_in_Mins=round(Downtime_in_Sec/60,2) | eval Downtime_in_Hrs=round(Downtime_in_Sec/3600,2) | stats sum(Downtime_in_Hrs) AS Total_Downtime
| table URL Status ReportTime Downtime_in_Sec Downtime_in_Mins Downtime_in_Hrs eventcount Total_Downtime

0 Karma

MuS
SplunkTrust
SplunkTrust

Please mark this as answered by ticking the tick - thanks

0 Karma

viksinha
Explorer

Thanks! that helped. Refining query further to get a presentable output.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...