Splunk Search

Chart grouped event over time with specific matching events

mattdaviscompar
Engager

I am currently trying to show a graphical representation of the number of times an a specific thing happens x number of times. When ever an event in our system is processed and fails we retry another 15 times, so if it completely fails there will be 16 entries in splunk. This all happens within a couple of seconds. The log entry contains the guid of the event and can be identified in splunk.

What I have done so far:

1 - Created a custom field that identifies the guid in the log entry, lets call it "eventid"
2 - Created a search that filters based on source and event type, it groups by "eventid" and filters where there are 16 of those events. Finally it shows that in a time chart.

sourcetype="mysource" "IdentifyCorrectEvent" | stats values(_time) as _time, count by eventid  | search count = 16 | timechart count | timechart per_hour(count)

This works so far as to show a visual representation of the number of times that this happens. For example if we had one failure (16 errors) in an hour it would show a count of 16, 2 in an hour would show a count of 32 and so on.

How do I get the chart to show the number of time there were 16 errors for a single event? This is my first effort with Splunk so feel free to say it is all wrong and I should have done xyz.

0 Karma
1 Solution

lguinn2
Legend

Will this work

sourcetype="mysource" "IdentifyCorrectEvent"
| timechart span=1m count by eventid
| eval count = ceiling(count/16)

The only problem that I can see with this solution would occur if an error event split with exactly 8 failures in one period and 8 in another... it would be counted twice with this scenario, since the time is sliced on minute boundaries.

Here is an alternate solution - in this case failures are defined based on eventid, but failures are also separated based on the time gap between events. In the example, if more than 10 seconds elapse between two events with the same id, they are considered different failures. This is a nice solution, but it will slow down significantly for huge numbers of events. (You could run the report over a shorter time period to compensate.)

sourcetype="mysource" "IdentifyCorrectEvent"
| transaction eventid maxpause=10s
| where eventcount > 15
| eval errorCount = round(errorCount / 16, 0)
| timechart sum(errorCount) as failure by eventid

View solution in original post

lguinn2
Legend

Will this work

sourcetype="mysource" "IdentifyCorrectEvent"
| timechart span=1m count by eventid
| eval count = ceiling(count/16)

The only problem that I can see with this solution would occur if an error event split with exactly 8 failures in one period and 8 in another... it would be counted twice with this scenario, since the time is sliced on minute boundaries.

Here is an alternate solution - in this case failures are defined based on eventid, but failures are also separated based on the time gap between events. In the example, if more than 10 seconds elapse between two events with the same id, they are considered different failures. This is a nice solution, but it will slow down significantly for huge numbers of events. (You could run the report over a shorter time period to compensate.)

sourcetype="mysource" "IdentifyCorrectEvent"
| transaction eventid maxpause=10s
| where eventcount > 15
| eval errorCount = round(errorCount / 16, 0)
| timechart sum(errorCount) as failure by eventid

mattdaviscompar
Engager

We happened to have a Splunk trainer in the building and he came up with pretty much the same solution. I don't have enough points to edit your answer so I will put it in here. sourcetype="mysource" "IdentifyCorrectEvent" | transaction maxspan=5s eventid | where eventcount>=16 | table _time eventid eventcount | timechart count.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...