Splunk Search

How to measure the server down time based on host and server status?

ShamGowda
Loves-to-Learn Lots

Sample Event: 
sent
=1
received=0 packet_loss=100 min_ping=NA avg_ping=NA max_ping=NA jitter=NA return_code=1 dest=SHTCE***

 

Tried code:

index=network
| eval Availability= case(received="1", 100,received="0", 0)
| stats avg(Availability) by dest
| sort +avg(Availability)
| rename avg(Availability) as "Availability %"
| streamstats current=f latest(packet_loss) as packet_loss latest(_time) as last_checked latest(_raw) AS prevEvent by dest
| eval downtime = _time - last_checked
| rename dest as Host
| table Host  

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

stats reduces your event pipeline to just two fields avg(Availability) and dest. Therefore, streamstats does not have fields packet_loss, _raw nor _time to work with.

ShamGowda
Loves-to-Learn Lots

Hi,

Can you please help me in getting the below field values.

Index Application Transaction Measurement Duration Down Time MTTR Availability(%)
1sgbuasdsdbp1 day(s)1 day(s)1 day(s)0
2sgbuasdsasd1 day(s)9 hr 49 min9 hr 49 min59.04
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Given the limited amount of data you provided, it is not possible to determine how such information would be derived.

Can you provide some more accurate (anonymised) events?

Can you explain how you want to calculate these fields?

0 Karma

ShamGowda
Loves-to-Learn Lots

I am able to fetch the below information of servers status:

By using Add-on, I am checking server status by pinging every 5min interval and validate the server name in look up and updating the server Status.

Now i need to take report of servers Down time and Availability%:

Ex.: 

Availability = (uptime during the period / total time) × 100
e.g. lets consider the report period of 1 week which is 168 hour ( consider 24*7 calculation)

For example, if any server was down for 3 hours in this period of 24 hours then the availability of this server would be
165/168*100 = 98.21%
Uptime in this example would be 165 hours ( 6 days 21 hours)Calculation of downtime is the based on the time spent by the server in a status is consider as down or 0%
So here downtime would be 3 hours.

 

based on _time, _raw and status received=1 or received=0, I need to calculate Down time and Server Availability in %. I was able to calculate the server Availability in % as shown in above msg code.
below table last checked is nothing but the last ping time.

AssetCI NameRDP IPOperating SystemCategoryStatusEnvironment Specificationlast_checked
server110.100.00.001Solaris 10CAT 2DownPRODUCTION2 minutes ago
server210.100.00.002Windows 2019 DCCAT 2DownProduction6 minutes ago
server310.100.00.003Solaris 10CAT 2UpPRODUCTION2 minutes ago
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

So, using streamstats as you have shown to get the previous event data for the dest, you could calculate the downtime as the difference in previous event time and current event time only if the previous event is a down event. Then you can sum the downtimes for each dest to give you the overall downtime for each dest. You then work out the total period for each dest covered by your search and subtract the total downtime for the dest to give you the total uptime, from which you can calculate the percentage availability.

0 Karma

ShamGowda
Loves-to-Learn Lots

I understood what you had explained. It will be helpful if you write that in sample query. My sample code. but downtime is not calculated.

index=network
| streamstats sparkline(avg(avg_ping)) as sparkline_ping avg(avg_ping) as ping max(max_ping) as max_ping latest(packet_loss) as packet_loss latest(_time) as last_checked range(avg_ping) as range min(avg_ping) as min by dest current=f
| search
| eval ping=round(ping, 0)." ms"
| eval average=round(avg_ping, 0)." ms"
| eval maximum=round(max_ping, 0)." ms"
| eval range=round(min, 0)." - ".round(min+range, 0)." ms"
| eval packet_loss=if(max_ping="NA",100,packet_loss)
| table dest packet_loss last_checked ping max_ping range sparkline_ping
| `timesince(last_checked,last_checked)`
| sort -ping
| lookup server_detail "Asset CI" as dest OUTPUTNEW "RDP IP" "Environment Specification" Category "Operating System"
| eval Status = case(packet_loss = "100","Down",packet_loss = "0","Up")
| eval Availability= case(packet_loss = "100",100,packet_loss = "0",0)
| stats avg(Availability) by dest
| sort +avg(Availability)

0 Karma
Get Updates on the Splunk Community!

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...