Splunk Search

How to calculate uptime percentage based on my data?

Explorer

Lets say my data is like this:

8/27/12 10:30:00.000 AM server=test1 and status=Down
8/27/12 10:29:00.000 AM server=test2 and status=Up
8/27/12 10:28:00.000 AM server=test3 and status=Down
8/27/12 10:27:00.000 AM server=test4 and status=Up
8/27/12 10:26:00.000 AM server=test1 and status=Up
8/27/12 10:25:00.000 AM server=test2 and status=Down
8/27/12 10:24:00.000 AM server=test3 and status=Up
8/27/12 10:23:00.000 AM server=test4 and status=Down

I want to calculate total uptime % for each server using total uptime(sum of all time differences between up status and next down status) divided by the total time starting when Splunk receives the first status message for a server.

Communicator

What did you end up doing for this? I'm trying to do the same calculation but I'm trying to use the 

index=_index source=*splunkd.log (event_message="*Splunkd starting*" OR event_message="*Shutting down splunkd")

0 Karma

Super Champion

this might be a good starting point:

|sort 0 server _time|streamstats current=f window=1 values(status) as prevStatus values(_time) as prevTime by server|eval diff=_time-prevTime

i'm not sure if you're values always go from Up to Down/Down to Up. You might need to add an eval in there that says |eval UpToDown=if(prevStatus="Up" AND status="Down",diff,null()) or something along those lines if you want it from Up to Down.

0 Karma

SplunkTrust
SplunkTrust

Is the reporting of the status of the servers on a regular basis? Or does it come in only when the status changes? For example, if it comes in regularly, then I would expect to see an event every 5 minutes (or whatever intervalic is to come in). If it only comes in at a status change, then you might go the entire period of the search without a single entry for a server. This difference makes the approach to solving your problem completely different.

0 Karma

Explorer

It's not on a regular basis. It gets reported only when status changes. But there can also be some cases, when two status consecutively received are Up(or Down).

0 Karma

Splunk Employee
Splunk Employee

You could try using transaction this will combine the events and create a duration field which will be the time between the 2 events. "| transaction server startswith=status=Up endswith=status=Down"

You would then need to calculate the time from last 24 hrs for example and then work the percentage.

0 Karma