Splunk Search

Search for events surrounding error clusters - transaction event too large

rlaan
Path Finder

I am trying to do analysis on a historical/intermittent issue that is surround a particular error in our logs.
This error occurs multiple times usually in large bursts in a 1m-3h window. I am trying to determine how best to find the start time and end time of a "error event" based on how far apart the events are calculated.

I am currently using the following search.

<Events in same time frame as subsearch> | [ search host=<server> "Error Message"
| transaction sourcetype maxpause=5m maxevents=-1
| table _time,duration
| eval earliest=_time-120
| eval latest=_time+duration+120
| fields earliest latest
| FORMAT "(" "(" "" ")" "OR" ")" ]

When looking at the subsearch, this works for when the errors contained in an event window have a low enough line count. there are a few large events that when run with maxevents=-1 appear to corrupt the search results and are unable to extract and display the startime and duration of the events. These large transaction events generated have line counts in the range of 82,000 or higher. 

Is there a way to gather the start and end time of a group of events without creating such a large transaction that splunk is unable to display correctly?

I am also trying to use a portion of the subsearch on its own to create a table displaying the start time and duration of each event, the large transactions also cause this table command to be unable to complete.

search host=<server> "Error Message"
| transaction sourcetype maxpause=5m maxevents=-1
| table _time,duration

Thank you for your help!

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you are hitting transaction limit issues, then you will have to look at stats to provide constraints in the subsearch as it won't have the limitations.

You can use stats range(_time) as duration to get duration and stats earliest/latest to get bounding events, but much will depend on what your data is giving you and how you're currently bounding your transaction (sourcetype).

However, not only will the query be much faster using stats than transaction, your results will be consistent and not give random results based on the time and volume of your data 🙂

Hopefully this gives you something to work with

0 Karma

rlaan
Path Finder

is there a way to use stats to capture multiple groups of events?
so say 80k of same event marking an event. then 6 hours of nothing followed by another event etc.
I am not sure how to use the stats command to frame the time of multiple event groups in a single search. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use time buckets and do stats on that, like this

...search...
| bin _time span=5m
| stats range(_time) as duration earliest(_time) as first latest(_time) as last by _time

which is using an arbitrary 5 minute time bucket - you could do some post processing of that stats to then "group" the buckets together where there is a non empty bucket, so you can collect a larger group.

 

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...