All Apps and Add-ons

uptime / downtime pct over 30 days

tmarlette
Motivator

I'm trying to figure out how to show uptime percent of a device in percentage over 30 days that is agnostic to both linux and windows data.  

I am currently using

index=os sourcetype=Unix:Uptime 

as my data set, and it's a default data set that ships with the Linux TA. 

for windows I am using this search:

index=wineventlog LogName=System EventCode=6013 
|rex field=Message "uptime is (?<uptime>\d+) seconds" 
| eval Uptime_Minutes=uptime/60 
| eval LastBoot=_time-uptime 
| convert  ctime(LastBoot) 
| eval uptime=tostring(uptime, "duration")
| stats latest(_time) as time by host, Message, uptime, LastBoot

 

Currently, I can't figure out how to account for a reboot that occurs during the month.  The linux data doesn't have a 'LastBoot' field like the windows data, and I'm not sure how to create one. 

This is the closest that I've gotten is to use something like this for either linux or windows, and simply rename / create the 'uptime' field in seconds. 

index=nix sourcetype=Unix:Uptime 
| rename SystemUpTime as uptime
| streamstats sum(uptime) as total by host
| eval tot_up=(total/157697280)*100
| eval host_uptime=floor(tot_up)
| stats max(host_uptime) as pctUp by host



This is obviously crude, and I'm trying to refine it though i'm looking for any help. I'm obviously missing something, and i'm sure i'm not the first person to ask a question like this though I couldn't find anything specific to this on answers. 

I have a search that shows me total uptime in duration for either windows or linux, and that's great!  I'm just looking for the total uptime in percent over a 30 days span that accounts for reboots, or legitimate system hard down incidents. 

Labels (3)
Tags (3)
0 Karma

tscroggins
Motivator

@tmarlette 

If you're using Splunk Add-on for Unix and Linux and Splunk Add-on for Windows, you can use the uptime tag:

tag=uptime

Both add-ons have uptime inputs with default intervals of 84600 seconds. Both source types have a field named uptime with a value in seconds.

With that understanding in hand, we can assume any value greater than or equal to 86400 represents 86400 seconds of uptime, and any value less than 86400 seconds is that value:

tag=uptime earliest=-30d@d latest=@d
| stats sum(eval(min(uptime, 86400))) as uptime by host
| eval uptime_percent=uptime/2592000 ```86400 seconds * 30 days```

You may want to include an error measurement to allow for variation in uptime polling schedules, downtime following the last available uptime measurement, etc.

0 Karma
Get Updates on the Splunk Community!

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...

Ready, Set, SOAR: How Utility Apps Can Up Level Your Playbooks!

 WATCH NOW Powering your capabilities has never been so easy with ready-made Splunk® SOAR Utility Apps. Parse ...

DevSecOps: Why You Should Care and How To Get Started

 WATCH NOW In this Tech Talk we will talk about what people mean by DevSecOps and deep dive into the different ...