Solved: Re: Calculating data with multiple transactions pe...

timm747747 · ‎04-04-2017

Hi, I have the following data with the following columns, OrderNo, Transaction Start, Transaction Stop. I wrote a search by OrderNo to get the time difference for each order. The problem is that Order Number 333 below has multiple transactions and I need to calculate based on every 2 lines of data based on OrderNo.

It works fine until I get to Orders that have multiple transactions.

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S")
| stats earliest(start_time) as start_time earliest(stop_time) as stop_time by OrderNo, Service
| eval duration=tostring(stop_time-start_time)
| stats mean(duration) as avg_duration by Service
| table Service, avg_duration

Is it possible to read through one OrderNo to split it up into several transactions. It's obvious I shouldn't be using earliest but I just realized some of the orders have multiple transactions and after searching and coming up empty I ended up here.

Thanks!

woodcock · ‎04-04-2017

Like this:

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats last(stop_time) as stop_time by OrderNo, Service
| search start_time="*"
| eval duration=tostring(stop_time-start_time) 
| eventstats mean(duration) as avg_duration by Service 
| table Service, avg_duration

View solution in original post

cmerriman · ‎04-04-2017

can you try using streamstats and filldown to get them into one line?

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| filldown transaction_start
| streamstats count by OrderNo transaction_start
| search count=2
| eval duration=tostring(stop_time-start_time) 
| stats mean(duration) as avg_duration by Service 
| table Service, avg_duration

sideview · ‎04-04-2017

If you can make the assumption that the multiple transactions will never be overlapping, or interleaving, then you can use streamstats to make yourself an additional field that can supply the extra distinctness. Note the streamstats I've added in line 4, and the extra "transaction_count" field I added to the group by clause in the stats command.

index=myindex source=mysource Service=myservice OrderNo=*
| eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats dc(start_time) as transaction_count by OrderNo
| stats earliest(start_time) as start_time earliest(stop_time) as stop_time by OrderNo, transaction_count, Service 
| eval duration=tostring(stop_time-start_time) 
| stats mean(duration) as avg_duration by Service 
| table Service, avg_duration

In the above, the | streamstats dc(start_time) as transaction_count by OrderNo command sneaks in just before the main stats, and it will paint a little integer onto each transaction. For the simple ones they'll each just get a "1" for that integer. But the ones with the multiple transaction will get a different integer for each transactoin.
Of course, if the multiple transactions for a given OrderNo can get interleaved, then this will become a mess. (You could even then craft an explicit search to test whether that assumption is true, possibly even run that second search for a while as an alert if you're paranoid about it. )

woodcock · ‎04-04-2017

Like this:

index=myindex source=mysource Service=myservice OrderNo=*
|eval start_time = strrptime(transaction_start, "%Y-%m-%d %H:%M:%S")
| eval stop_time = strptime(transaction_stop, "%Y-%m-%d %H:%M:%S") 
| streamstats last(stop_time) as stop_time by OrderNo, Service
| search start_time="*"
| eval duration=tostring(stop_time-start_time) 
| eventstats mean(duration) as avg_duration by Service 
| table Service, avg_duration

timm747747 · ‎04-05-2017

Thank you that worked!

sideview · ‎04-05-2017

Note that for your OrderNo values that have multiple transactions, this answer is going to calculate a single duration that is from the start of the Order's earliest transaction to the end of the latest transaction. it will factor that single large duration into the later average, rather than factoring in the individual transaction durations. But if that is OK or even desired, then indeed this approach is simpler.

Calculating data with multiple transactions per order

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Splunk App Dev Quarterly Roundup: AI, Agents, and Innovation!

Federated Search for Dynamic Data Self Storage Is Now Generally Available on Splunk ...

Index This | What has many keys but can’t unlock a door?

Join the Conversation