Splunk Search

Chart grouped event over time with specific matching events

mattdaviscompar
Engager

I am currently trying to show a graphical representation of the number of times an a specific thing happens x number of times. When ever an event in our system is processed and fails we retry another 15 times, so if it completely fails there will be 16 entries in splunk. This all happens within a couple of seconds. The log entry contains the guid of the event and can be identified in splunk.

What I have done so far:

1 - Created a custom field that identifies the guid in the log entry, lets call it "eventid"
2 - Created a search that filters based on source and event type, it groups by "eventid" and filters where there are 16 of those events. Finally it shows that in a time chart.

sourcetype="mysource" "IdentifyCorrectEvent" | stats values(_time) as _time, count by eventid  | search count = 16 | timechart count | timechart per_hour(count)

This works so far as to show a visual representation of the number of times that this happens. For example if we had one failure (16 errors) in an hour it would show a count of 16, 2 in an hour would show a count of 32 and so on.

How do I get the chart to show the number of time there were 16 errors for a single event? This is my first effort with Splunk so feel free to say it is all wrong and I should have done xyz.

0 Karma
1 Solution

lguinn2
Legend

Will this work

sourcetype="mysource" "IdentifyCorrectEvent"
| timechart span=1m count by eventid
| eval count = ceiling(count/16)

The only problem that I can see with this solution would occur if an error event split with exactly 8 failures in one period and 8 in another... it would be counted twice with this scenario, since the time is sliced on minute boundaries.

Here is an alternate solution - in this case failures are defined based on eventid, but failures are also separated based on the time gap between events. In the example, if more than 10 seconds elapse between two events with the same id, they are considered different failures. This is a nice solution, but it will slow down significantly for huge numbers of events. (You could run the report over a shorter time period to compensate.)

sourcetype="mysource" "IdentifyCorrectEvent"
| transaction eventid maxpause=10s
| where eventcount > 15
| eval errorCount = round(errorCount / 16, 0)
| timechart sum(errorCount) as failure by eventid

View solution in original post

lguinn2
Legend

Will this work

sourcetype="mysource" "IdentifyCorrectEvent"
| timechart span=1m count by eventid
| eval count = ceiling(count/16)

The only problem that I can see with this solution would occur if an error event split with exactly 8 failures in one period and 8 in another... it would be counted twice with this scenario, since the time is sliced on minute boundaries.

Here is an alternate solution - in this case failures are defined based on eventid, but failures are also separated based on the time gap between events. In the example, if more than 10 seconds elapse between two events with the same id, they are considered different failures. This is a nice solution, but it will slow down significantly for huge numbers of events. (You could run the report over a shorter time period to compensate.)

sourcetype="mysource" "IdentifyCorrectEvent"
| transaction eventid maxpause=10s
| where eventcount > 15
| eval errorCount = round(errorCount / 16, 0)
| timechart sum(errorCount) as failure by eventid

mattdaviscompar
Engager

We happened to have a Splunk trainer in the building and he came up with pretty much the same solution. I don't have enough points to edit your answer so I will put it in here. sourcetype="mysource" "IdentifyCorrectEvent" | transaction maxspan=5s eventid | where eventcount>=16 | table _time eventid eventcount | timechart count.

0 Karma
Get Updates on the Splunk Community!

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...