Help with calculating the success and failed perce...

jackin · ‎02-08-2022

We are receiving a log from the host(host=abc) and we have one interesting field named Ip_Address.
In this field, we have multiple IP's and event is indexing for each 5 min of an interval like(Ping success for Ip_Address=10.10.101.10 OR Ping failed for Ip_Address=10.10.101.10).

FYI, if I am getting events like(1:00pm ping failed and 1:05pm ping success) in this case we are not considering as failed percentage.
So, basically if count of failure is more than one time(means Continuously like 1:00pm ping failed and 1:05pm ping failed ) then only it will be considered as failure.

I am using below query to calculate the success and failed percentage of all ip's for an interval of time like 1 month or something but it is not fulfilling my requirement as I want to achieve for all ip's in a single query. It will be more useful if it shows in the dashboard visualization.

index=unix sourcetype=ping_log "Ping failed for Ip_Address=10.101.101.14"
(earliest="01/04/2022:07:00:00" latest="1/07/2022:18:00:00") OR (earliest="01/10/2022:07:00:00" latest="1/14/2022:18:00:00")OR (earliest="01/17/2022:07:00:00" latest="1/21/2022:18:00:00")OR (earliest="01/31/2022:07:00:00" latest="1/31/2022:18:00:00") 
| timechart span=600s count 
| where count=2 
| stats count 
| eval failed_min=count*10 
| eval total=failed_min/9900*100,SLA=100-total,Ip_Address="10.101.101.14"
| rename SLA as Success_Percent 
| table Success_Percent Ip_Address

ITWhisperer · ‎02-08-2022

| gentimes start=-1 increment=10s
| eval Ip_Address="10.10.101.".((random()%20)+1)
| rename starttime as _time 
| fields _time Ip_Address
| bin _time span=5m
| stats values(_time) as time values(Ip_Address) as Ip_Address
| mvexpand Ip_Address
| mvexpand time
| rename time as _time
| eval log="Ping ".mvindex(split("success|failed","|"),floor((random()%6)/5))." for Ip_Address=".Ip_Address
| fields _time log
``` The lines above create some random data ```
``` Extract status and ip address from log entry (if you don't already have these) ```
| rex field=log "Ping (?<status>\w+) for Ip_Address=(?<ip_address>\d+\.\d+\.\d+\.\d+)"
``` Get total events for each ip address (this may already be known if you log for every ip address in every 5 minute slot and you have fixed time ranges) ```
| eventstats count as total by ip_address
``` Sort by ip address and time ```
| sort 0 ip_address _time
``` We are only interested in failures ```
| where status="failed" 
``` Find time difference between successive failures by ip address ```
| streamstats window=2 global=f range(_time) as time_difference by ip_address
``` We are only interested in continuous failures i.e. where time difference is 5 minutes (300 seconds) ```
| where time_difference=300
``` Count failures and keep total by ip address ```
| stats count values(total) as total by ip_address
``` Calculate percentage failures for ip address ```
| eval percentage=100*count/total

jackin · ‎02-09-2022

ThankQ @ITWhisperer
but I do not want all IP address data. Only data need certain IP Addresses are required at the following timings

(earliest="01/04/2022:07:00:00" latest="1/07/2022:18:00:00") OR (earliest="01/10/2022:07:00:00" latest="1/14/2022:18:00:00")OR (earliest="01/17/2022:07:00:00" latest="1/21/2022:18:00:00")OR (earliest="01/31/2022:07:00:00" latest="1/31/2022:18:00:00")

ITWhisperer · ‎02-09-2022

That was a runanywhere example - replace the top part with your search

index=unix sourcetype=ping_log " for Ip_Address=10.101.101.14"
(earliest="01/04/2022:07:00:00" latest="1/07/2022:18:00:00") OR (earliest="01/10/2022:07:00:00" latest="1/14/2022:18:00:00")OR (earliest="01/17/2022:07:00:00" latest="1/21/2022:18:00:00")OR (earliest="01/31/2022:07:00:00" latest="1/31/2022:18:00:00") 
``` Extract status and ip address from log entry (if you don't already have these) ```
| rex field=_raw "Ping (?<status>\w+) for Ip_Address=(?<ip_address>\d+\.\d+\.\d+\.\d+)"
``` Get total events for each ip address (this may already be known if you log for every ip address in every 5 minute slot and you have fixed time ranges) ```
| eventstats count as total by ip_address
``` Sort by ip address and time ```
| sort 0 ip_address _time
``` We are only interested in failures ```
| where status="failed" 
``` Find time difference between successive failures by ip address ```
| streamstats window=2 global=f range(_time) as time_difference by ip_address
``` We are only interested in continuous failures i.e. where time difference is 5 minutes (300 seconds) ```
| where time_difference=300
``` Count failures and keep total by ip address ```
| stats count values(total) as total by ip_address
``` Calculate percentage failures for ip address ```
| eval percentage=100*count/total

jackin · ‎02-09-2022

@ITWhisperer ThankQ

Thanks for replying, But this query did not get the output we excepted, we need failed and success percentage within the time to mentioned Ip's in our CSV file. and I hope you understanding the how I considering the failed percentage.

final output like

IP_Address	Failed%	Success%
1.1.1.1	0.5	99.5

ITWhisperer · ‎02-09-2022

| eval failed=100*count/total
| eval success=100-failed

jackin · ‎02-09-2022

Hi@ITWhisperer
i am using below query as you suggested it gives the results as expected
here I forgot to mention 2 logics in above query
1. output comes only few hosts not for all(means all ip address which i have mentioned CSV file)

2.I need the data in specific times to bring in output like(from January 1st to January 17th and January 31st weekdays from 7am to 6pm)


index=os sourcetype=ping_log 
[ inputlookup Ping.csv]
(earliest="01/04/2022:07:00:00" latest="1/07/2022:18:00:00") OR (earliest="01/10/2022:07:00:00" latest="1/14/2022:18:00:00")OR (earliest="01/17/2022:07:00:00" latest="1/21/2022:18:00:00")OR (earliest="01/31/2022:07:00:00" latest="1/31/2022:18:00:00") 
| rex field=_raw "Ping (?<status>\w+) for Ip_Address=(?<ip_address>\d+\.\d+\.\d+\.\d+)" 
| eventstats count as total by ip_address 
| sort 0 ip_address _time 
| where status="failed" 
| streamstats window=2 global=f range(_time) as time_difference by ip_address 
| where time_difference=300 
| stats count values(total) as total by ip_address 
| eval failed=100*count/total
| eval success=100-failed

Help with calculating the success and failed percentage of all IP's for an interval of time like 1 month?

simple XML

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Are you a member of the Splunk Community?

Help with calculating the success and failed percentage of all IP's for an interval of time like 1 month?

simple XML

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!