Re: Help showing the Uptime in days for a Universa...

johnward4 · ‎06-23-2020

Hello, I'm looking for help showing the Uptime/downtime percentage for my Universal Forwarders (past 7 days) :

I've seen many people trying to solve a similar use case on Answers but haven't quite seen what I'm looking for yet.. I've been testing the below query and my thinking was to calculate the difference in minutes between a host's timestamp for eval field Action = "Splunkd Shutdown" - "Action = "Splunkd Starting". Then sum the total in minutes divided by the total minutes in 1 week (10080) to get the uptime? There are problems with this logic though because if the last time a host shutdown is not within your search window you won't get an accurate metric. I'm open to a discussion to see how this can be monitoring most accurately.

This query returns the host and timestamp for when splunkd shut down and another event with timestamp when Splunkd started.

index=_internal source="*SplunkUniversalForwarder*\\splunkd.log" (event_message="*Splunkd starting*" OR event_message="*Shutting down splunkd*") | eval Action = case(like(event_message, "%Splunkd starting%"), "Splunkd Starting", like(event_message, "%Shutting down splunkd%"), "Splunkd Shutdown")
| stats count by host, _time, Action

niketn · ‎06-25-2020

@johnward4 you are possibly looking for the /deployment/server/clients rest endpoint. (Refer to Splunk Documentation for details: https://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTdeploy#deployment.2Fserver.2Fclients)

| rest splunk_server=local /services/deployment/server/clients
| fieldformat lastPhoneHomeTime=strftime(lastPhoneHomeTime,"%Y/%m/%d %H:%M:%S.%3N")

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

johnward4 · ‎06-26-2020

@niketn The rest command you recommended looks like it's meant for the deployment server. I'm using Splunk Cloud and don't have any on-prem deployment server so I've tried using the index = _internal source=*splunkd.log to monitor if my UFs are online ...

I'm looking to show a % of Uptime for the past 7 days, looking for help on how you may subtract timestamps for two different values to show how long a host was down and then sum the total of that downtime divided by 7 days. Also open to suggestions for a better way to calculate this.

index=_internal source="*SplunkUniversalForwarder*\\splunkd.log" (event_message="*Splunkd starting*" OR event_message="*Shutting down splunkd*") | eval Action = case(like(event_message, "%Splunkd starting%"), "Splunkd Starting", like(event_message, "%Shutting down splunkd%"), "Splunkd Shutdown")
| stats count by host, _time, Action

This query returns the host and timestamp for when splunkd shut down and another event with timestamp when Splunkd started.
This query returns
| stats values(Action) as Action by host, _time

johnward4 · ‎06-25-2020

Since my hosts are Windows based I found this query to be helpful to show Uptime :

index=wineventlog host=* source="WinEventLog:System" EventCode=6013 
| rex field=Message "The system uptime is (?<SystemUpTime>\d+) seconds."
| dedup host 
| eval DaysUp=round(SystemUpTime/86400,2) 
| eval Years=round(DaysUp/365,2) 
| eval Months=round(DaysUp/30,2)
| table host DaysUp Years Months SystemUpTime
| sort host(index=wineventlog sourcetype=”WinEventLog:System” EventCode=6013)
| search DaysUp > 0 
| strcat DaysUp " Days" UpTime 
| sort - DaysUp
| table host UpTime
| fields - Years, Months, SystemUpTime

niketn · ‎06-26-2020

@johnward4 with the question I assumed you wanted real-time monitoring. If you want historical data that might be right approach. However, REST API would be fastest if you want to know what is down right now.

Also with the SPL you are using I think following does the same and would perform better:

index=wineventlog host=* source="WinEventLog:System" EventCode=6013 
| fields host Message
| rex field=Message "The system uptime is (?<SystemUpTime>\d+) seconds." 
| dedup host
| search SystemUpTime>86400
| eval UpTime=round(SystemUpTime/86400,2)
| sort - UpTime
| table host UpTime
| eval UpTime=UpTime." Days"

Also move the rex command to Field Extraction.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

Help showing the Uptime/downtime percentage for a Universal Forwarder

Announcing the Expansion of the Splunk Academic Alliance Program

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Buttercup Games: Further Dashboarding Techniques (Part 7)