Search for events surrounding error clusters - tra...

rlaan · ‎03-24-2021

I am trying to do analysis on a historical/intermittent issue that is surround a particular error in our logs.
This error occurs multiple times usually in large bursts in a 1m-3h window. I am trying to determine how best to find the start time and end time of a "error event" based on how far apart the events are calculated.

I am currently using the following search.

<Events in same time frame as subsearch> | [ search host=<server> "Error Message"
| transaction sourcetype maxpause=5m maxevents=-1
| table _time,duration
| eval earliest=_time-120
| eval latest=_time+duration+120
| fields earliest latest
| FORMAT "(" "(" "" ")" "OR" ")" ]

When looking at the subsearch, this works for when the errors contained in an event window have a low enough line count. there are a few large events that when run with maxevents=-1 appear to corrupt the search results and are unable to extract and display the startime and duration of the events. These large transaction events generated have line counts in the range of 82,000 or higher.

Is there a way to gather the start and end time of a group of events without creating such a large transaction that splunk is unable to display correctly?

I am also trying to use a portion of the subsearch on its own to create a table displaying the start time and duration of each event, the large transactions also cause this table command to be unable to complete.

search host=<server> "Error Message"
| transaction sourcetype maxpause=5m maxevents=-1
| table _time,duration

Thank you for your help!

bowesmana · ‎03-24-2021

If you are hitting transaction limit issues, then you will have to look at stats to provide constraints in the subsearch as it won't have the limitations.

You can use stats range(_time) as duration to get duration and stats earliest/latest to get bounding events, but much will depend on what your data is giving you and how you're currently bounding your transaction (sourcetype).

However, not only will the query be much faster using stats than transaction, your results will be consistent and not give random results based on the time and volume of your data 🙂

Hopefully this gives you something to work with

rlaan · ‎03-25-2021

is there a way to use stats to capture multiple groups of events?
so say 80k of same event marking an event. then 6 hours of nothing followed by another event etc.
I am not sure how to use the stats command to frame the time of multiple event groups in a single search.

bowesmana · ‎03-25-2021

You can use time buckets and do stats on that, like this

...search...
| bin _time span=5m
| stats range(_time) as duration earliest(_time) as first latest(_time) as last by _time

which is using an arbitrary 5 minute time bucket - you could do some post processing of that stats to then "group" the buckets together where there is a non empty bucket, so you can collect a larger group.

Search for events surrounding error clusters - transaction event too large

transaction

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services