I am running a search for multiple events over a range of time. In that search, I want to only find events of one specific type if they occur 1 minute after the start of the time range.
The problem I am trying to solve here is dealing with time drift when working with transactions. For example, I have the events A, B, C which compose a transaction, but time drift will occasionally cause C to show up before A in Splunk: C, A, B. This causes my search to not find the final event in the transaction. My thought is that, if I only find transactions where the first event, A, happens 1 minute after the beginning of the search range then that will provide a buffer to deal with the time drift.
A sample set of event (This shows the clock skew, ServiceA at the bottom is actually where the chain of events begins, but the clockskew on our VMs causes it to show up last):
2015-12-09 19:12:25.5499|INFO|ServiceB|action=Received|corId=inventory_ceab9c8b6d694a09a9a10932097e632e|msgId=c87c2760468d45c0acbc246f25393ef3|parentId=8a3a1c5734414309ba6697a3ade10956
2015-12-09 19:12:25.5813|INFO|ServiceC|action=Sent|corId=inventory_ceab9c8b6d694a09a9a10932097e632e|msgId=c87c2760468d45c0acbc246f25393ef3|parentId=8a3a1c5734414309ba6697a3ade10956
2015-12-09 19:12:28.1707|INFO|ServiceA|action=Sent|eventType=inventoryUpdate|corId=inventory_ceab9c8b6d694a09a9a10932097e632e
If the field corId is unique to each transaction, then you can use stats instead.
For example, you could run
<your search> | stats values(*) as * by corId
You would get a table of each corId and the events that have that corId field. There are huge benefits to not using the transaction command. The search will run faster, and be more accurate (over large event sets transaction will throw out events to save memory)
On another note, I would check your NTP configuration. 4 secondstime delay is a big difference!
If the field corId is unique to each transaction, then you can use stats instead.
For example, you could run
<your search> | stats values(*) as * by corId
You would get a table of each corId and the events that have that corId field. There are huge benefits to not using the transaction command. The search will run faster, and be more accurate (over large event sets transaction will throw out events to save memory)
On another note, I would check your NTP configuration. 4 secondstime delay is a big difference!
The corId field is unique to each transaction, so I'll take a look at changing over to stats instead. The NTP configuration is also something which definitely is getting addressed.
However, I'm hoping to address the core problem, which is that no matter what there's some chance that the Complete event will show up before the Start event. With stats would be be possible to filter our corId's where the "Send" time is less than (time range minimum + 1)?
It shouldn't be an issue if you use stats
instead of transaction
, as you are no longer looking at the timestamp. As such it doesn't matter in what order the events appear in Splunk - stats will group them into the correct set based on the corId.
If this doesn't quite make sense it may be because of assumptions I am making about your data - it might help if you describe or show how you want your final results to look (ie what fields you wish to display)
Any chance you can post some sample events? You may not need to use transaction at all
Added some sample events in the original question. Thanks!