Splunk Search

Calculating data with multiple transactions per order

Path Finder

Hi, I have the following data with the following columns, OrderNo, Transaction Start, Transaction Stop. I wrote a search by OrderNo to get the time difference for each order. The problem is that Order Number 333 below has multiple transactions and I need to calculate based on every 2 lines of data based on OrderNo.

alt text

It works fine until I get to Orders that have multiple transactions.

index=myindex source=mysource Service=myservice OrderNo=*
|eval starttime = strrptime(transactionstart, "%Y-%m-%d %H:%M:%S")
| eval stoptime = strptime(transactionstop, "%Y-%m-%d %H:%M:%S")
| stats earliest(starttime) as starttime earliest(stoptime) as stoptime by OrderNo, Service
| eval duration=tostring(stoptime-starttime)
| stats mean(duration) as avgduration by Service
| table Service, avg
duration

Is it possible to read through one OrderNo to split it up into several transactions. It's obvious I shouldn't be using earliest but I just realized some of the orders have multiple transactions and after searching and coming up empty I ended up here.

Thanks!

Tags (2)
1 Solution

Esteemed Legend

Like this:

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats last(stop_time) as stop_time by OrderNo, Service
| search start_time="*"
| eval duration=tostring(stop_time-start_time) 
| eventstats mean(duration) as avg_duration by Service 
| table Service, avg_duration

View solution in original post

Super Champion

can you try using streamstats and filldown to get them into one line?

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| filldown transaction_start
| streamstats count by OrderNo transaction_start
| search count=2
| eval duration=tostring(stop_time-start_time) 
| stats mean(duration) as avg_duration by Service 
| table Service, avg_duration
0 Karma

SplunkTrust
SplunkTrust

If you can make the assumption that the multiple transactions will never be overlapping, or interleaving, then you can use streamstats to make yourself an additional field that can supply the extra distinctness. Note the streamstats I've added in line 4, and the extra "transaction_count" field I added to the group by clause in the stats command.

index=myindex source=mysource Service=myservice OrderNo=*
| eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats dc(start_time) as transaction_count by OrderNo
| stats earliest(start_time) as start_time earliest(stop_time) as stop_time by OrderNo, transaction_count, Service 
| eval duration=tostring(stop_time-start_time) 
| stats mean(duration) as avg_duration by Service 
| table Service, avg_duration

In the above, the | streamstats dc(start_time) as transaction_count by OrderNo command sneaks in just before the main stats, and it will paint a little integer onto each transaction. For the simple ones they'll each just get a "1" for that integer. But the ones with the multiple transaction will get a different integer for each transactoin.
Of course, if the multiple transactions for a given OrderNo can get interleaved, then this will become a mess. (You could even then craft an explicit search to test whether that assumption is true, possibly even run that second search for a while as an alert if you're paranoid about it. )

0 Karma

Esteemed Legend

Like this:

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats last(stop_time) as stop_time by OrderNo, Service
| search start_time="*"
| eval duration=tostring(stop_time-start_time) 
| eventstats mean(duration) as avg_duration by Service 
| table Service, avg_duration

View solution in original post

Path Finder

Thank you that worked!

0 Karma

SplunkTrust
SplunkTrust

Note that for your OrderNo values that have multiple transactions, this answer is going to calculate a single duration that is from the start of the Order's earliest transaction to the end of the latest transaction. it will factor that single large duration into the later average, rather than factoring in the individual transaction durations. But if that is OK or even desired, then indeed this approach is simpler.

0 Karma