I have the following events in Splunk:
_time                                                        Agent_Hostname      alarm                             status
2020-08-23T03:04:05.000-0700 m50-ups.a_domain upsAlarmOnBypass raised
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsAlarmInputBad raised
2020-08-23T03:07:39.000-0700 m50-ups.a_domain upsAlarmOnBypass raised
2020-08-23T03:07:39.000-0700 m50-ups.a_domain upsAlarmLowBattery raised
2020-08-23T03:08:17.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:09:24.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:10:31.000-0700 m50-ups.a_domain upsAlarmOnBattery cleared
2020-08-23T03:10:32.000-0700 m50-ups.a_domain upsAlarmInputBad cleared
2020-08-23T03:11:12.000-0700 m50-ups.a_domain upsAlarmLowBattery cleared
2020-08-23T03:19:06.000-0700 m50-ups.a_domain upsAlarmInputBad raised
2020-08-23T03:19:06.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:19:13.000-0700 m50-ups.a_domain upsAlarmLowBattery raised
2020-08-23T03:20:10.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:21:16.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:22:22.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:23:29.000-0700 m50-ups.a_domain upsTrapOnBattery raised
2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad cleared
2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmOnBattery cleared
2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery cleared
2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass cleared
My problem is how to compute records of incidents' duration for each host and each alarm type, for example,
from the above events I'd have the following:
start                                                           end                                                            Agent_Hostname      alarm
2020-08-23T03:04:05.000-0700 2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass
2020-08-23T03:07:16.000-0700 m50-ups.a_domain upsTrapOnBattery
2020-08-23T03:07:16.000-0700 2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad
2020-08-23T03:07:39.000-0700 2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery
where start is the earliest time when an alarm for a host is first raised, and
end is the time when the same alarm/host is cleared.
My second problem is how to find the biggest span of duration among those enclosed spans, ignoring those without end time.
My question is how I can achieve within the framework of Splunk?
Hi @yshen , for 1st query , table for alarm , host and durations, you can use below query
| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration
For 2nd one,
| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| search end!=""
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| fields - start_epoch end_epoch _time
| table Agent_hostname alarm start end duration_mins
| stats max(duration_mins) as max_duration_mins by Agent_hostname,alarm
Hi @yshen , for 1st query , table for alarm , host and durations, you can use below query
| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration
For 2nd one,
| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
| search end!=""
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| fields - start_epoch end_epoch _time
| table Agent_hostname alarm start end duration_mins
| stats max(duration_mins) as max_duration_mins by Agent_hostname,alarm
@Nisha18789 Thanks for your help.
For the first problem, I'm looking for a solution that can compute the start time, end time for each host, and each alarm type. It seems that your suggestion for the first problem is a hard coded one. It would not work with other situation that the start time may not be "2020-08-23T03:04:05.000-0700".
I have not fully understood the second part of your suggestion to the 2nd problem, besides the same limitation of hard coding the start and end time. Your solution might work with the hard coding improved.
I see. Thanks for the clarification!
Hi @yshen , the part where I hardcoded is just a run anywhere example.
Please only use the highlighted in violet color part with your existing fields in the log for both the questions.
I studied in details of your solution. I'm afraid that it does not solve my problems.
Here are my understanding and paraphrase of your proposal:
| makeresults
| eval start="2020-08-23T03:04:05.000-0700"
| eval end="2020-08-23T03:25:58.000-0700"
| eval Agent_hostname="m50-ups.a_domain"
| eval alarm="upsAlarmOnBypass"
Create data fields of start, end, etc.
| eval start_epoch=strptime(start,"%Y-%m-%dT%H:%M:%S.000-0700"), end_epoch=strptime(end,"%Y-%m-%dT%H:%M:%S.000-0700")
| eval duration_mins = ROUND((end_epoch - start_epoch)/60,2)
| table Agent_hostname alarm start end duration
based on the known value of start and end values, compute the duration.
But the key problem of mine is how to find the proper values of the start, and end! This problem is not being addressed by your proposal.
Maybe, I should phrase my question as how to find the earliest start of an alarm and the time when it's being cleared.
Below is my sketch of a solution, it may not be perfect, but I hope to show how the start and end might be computed outside of Splunk:
;; assume the symbol events is for the events of alarms in chronological order
(as-> (group-by :Agent_Hostname events) ; group the events by Agent_Hostname value
    grouped-host                      
    (map                                ; for each host
     (fn [[host events-host]]
       [host
        (as-> (group-by                 ; group the events by alarm value
               #(alarm-classification (% :alarm)) events-host) grouped-alarm 
          (map                          ; for each alarm
           (fn [[alarm events-alarm]]
             [alarm
              (as->
                  (partition-by :status ; partition the events by same value of :status
                                events-alarm) x 
                  (map first x) ; only take the first (the earliest) event of the same status)
                  (partition 2 x)      ; combine the start and end events
                  (map start-end x))]) ; add the start and end time of an alarm event
           grouped-alarm)
          )])
     grouped-host))
;; => (["m50-tc-ups.bart.gov"
;;      (["upsAlarmOnBypass"
;;        ({:start "2020-08-23T03:04:05.000-0700",
;;          :end "2020-08-23T03:25:58.000-0700"})]
;;       ["upsAlarmOnBattery"
;;        ({:start "2020-08-23T03:07:16.000-0700",
;;          :end "2020-08-23T03:10:31.000-0700"}
;;         {:start "2020-08-23T03:19:06.000-0700",
;;          :end "2020-08-23T03:24:28.000-0700"})]
;;       ["upsAlarmInputBad"
;;        ({:start "2020-08-23T03:07:16.000-0700",
;;          :end "2020-08-23T03:10:32.000-0700"}
;;         {:start "2020-08-23T03:19:06.000-0700",
;;          :end "2020-08-23T03:24:28.000-0700"})]
;;       ["upsAlarmLowBattery"
;;        ({:start "2020-08-23T03:07:39.000-0700",
;;          :end "2020-08-23T03:11:12.000-0700"}
;;         {:start "2020-08-23T03:19:13.000-0700",
;;          :end "2020-08-23T03:25:09.000-0700"})])])
Hi @yshen , I understand your point now. Try below with your log events
I am assuming your _time field has format like : 2020-08-23T03:04:05.000-0700 and it represents the start time in each log event.
your base search....
| transaction Agent_Hostname alarm startswith="raised" endswith="cleared"
|eval end=_time+duration, start=_time
|eval end=strftime(end,"%Y-%m-%dT%H:%M:%S.%3N-0700"),start=strftime(start,"%d-%m-%Y %H:%M:%S.%3N-0700")
| table start,end ,Agent_Hostname , alarm, duration
Try and let me know.
