Splunk Search

How to Compute Incident Duration Records?

yshen
Communicator

I have the following events in Splunk:

_time                                                        Agent_Hostname      alarm                             status
2020-08-23T03:04:05.000-0700 m50-ups.a_domain upsAlarmOnBypass raised
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsAlarmInputBad raised
2020-08-23T03:07:39.000-0700 m50-ups.a_domain upsAlarmOnBypass raised
2020-08-23T03:07:39.000-0700 m50-ups.a_domain upsAlarmLowBattery raised
2020-08-23T03:08:17.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:09:24.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:10:31.000-0700 m50-ups.a_domain upsAlarmOnBattery cleared
2020-08-23T03:10:32.000-0700 m50-ups.a_domain upsAlarmInputBad cleared
2020-08-23T03:11:12.000-0700 m50-ups.a_domain upsAlarmLowBattery cleared
2020-08-23T03:19:06.000-0700 m50-ups.a_domain upsAlarmInputBad raised
2020-08-23T03:19:06.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:19:13.000-0700 m50-ups.a_domain upsAlarmLowBattery raised
2020-08-23T03:20:10.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:21:16.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:22:22.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:23:29.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad cleared
2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmOnBattery cleared
2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery cleared
2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass cleared

My problem is how to compute records of incidents' duration for each host and each alarm type, for example,
from the above events I'd have the following:

start                                                           end                                                            Agent_Hostname      alarm
2020-08-23T03:04:05.000-0700 2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsTrapOnBattery
2020-08-23T03:07:16.000-0700 2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad
2020-08-23T03:07:39.000-0700 2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery

where start is the earliest time when an alarm for a host is first raised, and
end is the time when the same alarm/host is cleared.

My second problem is how to find the biggest span of duration among those enclosed spans, ignoring those without end time.

My question is how I can achieve within the framework of Splunk?

Labels (2)
Tags (1)
0 Karma
1 Solution

Nisha18789
Builder

Hi @yshen , for 1st query , table for alarm , host  and durations, you can use below query 

| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"

| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration

 

For 2nd one,

| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| search end!=""
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| fields - start_epoch end_epoch _time
| table Agent_hostname alarm start end duration_mins
| stats max(duration_mins) as max_duration_mins by Agent_hostname,alarm

 

 

View solution in original post

Nisha18789
Builder

Hi @yshen , for 1st query , table for alarm , host  and durations, you can use below query 

| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"

| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration

 

For 2nd one,

| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| search end!=""
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| fields - start_epoch end_epoch _time
| table Agent_hostname alarm start end duration_mins
| stats max(duration_mins) as max_duration_mins by Agent_hostname,alarm

 

 

yshen
Communicator

@Nisha18789  Thanks for your help. 

For the first problem, I'm looking for a solution that can compute the start time, end time for each host, and each alarm type. It seems that your suggestion for the first problem is a hard coded one. It would not work with other situation that the start time may not be "2020-08-23T03:04:05.000-0700".

I have not fully understood the second part of your suggestion to the 2nd problem, besides the same limitation of hard coding the start and end time. Your solution might work with the hard coding improved.

0 Karma

yshen
Communicator

@Nisha18789 

I see. Thanks for the clarification!

0 Karma

Nisha18789
Builder

Hi @yshen , the part where I hardcoded is just a run anywhere example.

Please only use the highlighted in violet color part with your existing fields in the log for both the questions.

 

0 Karma

yshen
Communicator

@Nisha18789 

I studied in details of your solution. I'm afraid that it does not solve my problems.

Here are my understanding and paraphrase of your proposal:

 

| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"

 

Create data fields of start, end, etc.

 

| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration

 

based on the known value of start and end values,  compute the duration.

But the key problem of mine is how to find the proper values of the start, and end! This problem is not being  addressed by your proposal. 

Maybe, I should phrase my question as how to find the earliest start of an alarm and the time when it's being cleared.

Below is my sketch of a solution, it may not be perfect, but I hope to show how the start and end might be computed outside of Splunk:

 

;; assume the symbol events is for the events of alarms in chronological order
(as-> (group-by :Agent_Hostname events) ; group the events by Agent_Hostname value
    grouped-host                      
    (map                                ; for each host
     (fn [[host events-host]]
       [host
        (as-> (group-by                 ; group the events by alarm value
               #(alarm-classification (% :alarm)) events-host) grouped-alarm 
          (map                          ; for each alarm
           (fn [[alarm events-alarm]]
             [alarm
              (as->
                  (partition-by :status ; partition the events by same value of :status
                                events-alarm) x 
                  (map first x) ; only take the first (the earliest) event of the same status)
                  (partition 2 x)      ; combine the start and end events
                  (map start-end x))]) ; add the start and end time of an alarm event
           grouped-alarm)
          )])
     grouped-host))
;; => (["m50-tc-ups.bart.gov"
;;      (["upsAlarmOnBypass"
;;        ({:start "2020-08-23T03:04:05.000-0700",
;;          :end "2020-08-23T03:25:58.000-0700"})]
;;       ["upsAlarmOnBattery"
;;        ({:start "2020-08-23T03:07:16.000-0700",
;;          :end "2020-08-23T03:10:31.000-0700"}
;;         {:start "2020-08-23T03:19:06.000-0700",
;;          :end "2020-08-23T03:24:28.000-0700"})]
;;       ["upsAlarmInputBad"
;;        ({:start "2020-08-23T03:07:16.000-0700",
;;          :end "2020-08-23T03:10:32.000-0700"}
;;         {:start "2020-08-23T03:19:06.000-0700",
;;          :end "2020-08-23T03:24:28.000-0700"})]
;;       ["upsAlarmLowBattery"
;;        ({:start "2020-08-23T03:07:39.000-0700",
;;          :end "2020-08-23T03:11:12.000-0700"}
;;         {:start "2020-08-23T03:19:13.000-0700",
;;          :end "2020-08-23T03:25:09.000-0700"})])])

 

 

0 Karma

Nisha18789
Builder

Hi @yshen , I understand your point now. Try below with your log events

I am assuming your _time field has format like : 2020-08-23T03:04:05.000-0700 and it represents the start time in each log event.

your base search....
| transaction Agent_Hostname alarm startswith="raised" endswith="cleared"
|eval end=_time+duration, start=_time
|eval end=strftime(end,"%Y-%m-%dT%H:%M:%S.%3N-0700"),start=strftime(start,"%d-%m-%Y %H:%M:%S.%3N-0700")
| table start,end ,Agent_Hostname , alarm, duration

 

Try and let me know.

 

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...