Hello everyone, I have a bit of a strange requirement, which includes close work with time values.
I have Splunk events in the following format:
event_time: 2023-06-29T14:49:42.787Z
shipment_status: delivered
timestamp: 2023-06-29T14:49:51.069Z
tracking_number:95AAEC4900000
Where shipment_status can have different values, but I need only in_transit and delivered, also timestamp field is basically the same as the built-in _time field.
I need to group events based on tracking_number field, and show the percentage of these pairs to the rest of the events, but do that in a way that in_transit events should have come after delivered events more than an hour after. The catch is - we should consider the difference between _time/timestamp field for in_transit event and event_time field for delivered event.
For example, my query:
event_status="delivered" OR event_status="in_transit" | transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true | where duration > 3600 | stats sum(eventcount) as eventcount by closed_txn | eventstats sum(eventcount) as totalcount | where closed_txn == 1``` remaining eventcount only includes complete transactions ``` | eval percentage = eventcount * 100 / totalcount
uses a duration field, which automatically uses _time field for both events, but I need to get the duration of <_time/timestamp(second_event/in_transit)> - <event_time(first_event/delivered)> = 1h15m(for example), which is more than one hour, so this transaction should be included in the eventcount calculation.
Basically, I need to change this | where duration > 3600 condition in my query for a correct time calculation of both events in the transaction, fitting what I described earlier.
I still have not found a way to compare fields for these separate events in the transaction, so could someone offer some help? I will be very grateful for any suggestions or solutions!
Consider altering the _time field of the "delivered" events so the transaction command uses the desired timestamps.
event_status="delivered" OR event_status="in_transit"
| eval _time=if(event_status="delivered", strptime(event_time, "%Y-%m-%dT%H:%M:%S.%3N%Z"), _time)
| sort - _time
| transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true
| where duration > 3600
| stats sum(eventcount) as eventcount by closed_txn
| eventstats sum(eventcount) as totalcount
| where closed_txn == 1``` remaining eventcount only includes complete transactions ```
| eval percentage = eventcount * 100 / totalcount
Good solution! I didn't even know we could intervene in the fields to this extent. But this solution didn't work for me. Splunk gives Error in 'transaction' command: This search requires events to be in descending time order, but the preceding search does not guarantee time-ordered events.
I don't really understand the problem, if the conditions are not met, then why not ignore this transaction in the final calculation? Probably has something to do with transaction conditions startswith and endswith. But I'm not sure. Do you happen to know a solution?
When transaction makes that complaint, throw in a sort. See my updated answer.
@richgalloway I have another query:
event_status="delivered" OR event_status="in_transit"
| transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true
| where duration > 3600
| search closed_txn=1
Which returned 5 transactions fit for this condition for the last 2 weeks. Here's one of those(accidentally fits my final requirement):
{"timestamp":"2023-07-06T23:44:52.784Z","tracking_number":"C11900891311111","event_status":"delivered","event_time":"2023-07-06T23:44:38.722Z"}
{"timestamp":"2023-07-07T07:03:10.712Z","tracking_number":"C11900891311111","event_status":"in_transit","event_time":"2023-07-03T18:24:36.668Z"}
If we take a look at event_time field of the delivered event and assume that it will replace _time field for it, then after comparison of two timestamps in_transit event clearly came 8 hours after the first one.
For some reason, these 5 transactions are not included in the total calculations in the query that you helped me to compose:
event_status="delivered" OR event_status="in_transit"
| eval _time=if(event_status="delivered", strptime(event_time, "%Y-%m-%dT%H:%M:%S.%3N%Z"), _time)
| sort - _time
| transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true
| where duration > 3600
| stats sum(eventcount) as eventcount by closed_txn
| eventstats sum(eventcount) as totalcount
| where closed_txn == 1``` remaining eventcount only includes complete transactions ```
| eval percentage = eventcount * 100 / totalcount
The strange thing is - I tried singling this transaction this way:
event_status="delivered" OR event_status="in_transit" AND "C11900891311111"
| eval _time=if(event_status="delivered", strptime(event_time, "%Y-%m-%dT%H:%M:%S.%3N%Z"), _time)
| sort - _time
| transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true
| where duration > 3600
| stats sum(eventcount) as eventcount by closed_txn
And successfully found it!
But removing condition AND "C11900891311111" at the start of the query gives me 0 results. This is very strange, right?
P.S. Sorry for my continuous inquiries, I just really want to get to the root of the problem. I will upvote your answers as well!
Try this more generic query.
(event_status="delivered" OR event_status="in_transit") AND tracking_number=*
| eval _time=if(event_status="delivered", strptime(event_time, "%Y-%m-%dT%H:%M:%S.%3N%Z"), _time)
| sort - _time
| transaction tracking_number startswith=event_status=delivered endswith=event_status=in_transit keepevicted=true
| where duration > 3600
| stats sum(eventcount) as eventcount by closed_txn
Unfortunately, this didn't work for me. Still, no idea why those events are not found