Reporting
Highlighted

How to Report System Uptime/Downtime?

Explorer

I'm trying to create a system uptime/downtime report using under the following conditions:

  • Server starts up, logs a message: "server X starting up"
  • Some critical transactions start to fail but the server remains up. For the purposes of reporting the server is considered down. Log entries start appearing, such as: "transaction Y failed"
  • At some point system functionality has to be restored by either restarting the server or by terminating the process. As such, we can't rely on a graceful shutdown message appearing in the logs.

So, the downtime can be measured as the difference between the first and last occurrence of "transaction Y failed" between "server X starting up" messages.

I'm looking for suggestions as to how I'd go about creating this report...I can determine all of the information via manual searches but I'd rather automate the process.

Tags (2)
0 Karma
Highlighted

Re: How to Report System Uptime/Downtime?

SplunkTrust
SplunkTrust

Try this

index=yourindex sourcetype=yoursourcetype "server X starting up" OR "transaction * failed" 
|rex <<field extraction for message, if not already extracted>> | sort 0 _time | eval type=if(like(message,"server % starting up"),"Up","Down")| streamstats current=f window=1 first(type) as prevType | eval include=if(type=prevType,"N","Y") | where include="Y" 

This should give you just the logs with "server X starting up" and first log with "transaction Y failed". After that you can use transaction command to calculate duration which will be your downtime.

Highlighted

Re: How to Report System Uptime/Downtime?

Explorer

I'm getting an error with the like clause...it appears to only want 2 parameters:

like(message,"server % starting up","Up","Down")

0 Karma
Highlighted

Re: How to Report System Uptime/Downtime?

SplunkTrust
SplunkTrust

There was a syntax error in like command. Updated the same.

0 Karma