Splunk Search

how to mesure the duration of unfinished and uncompleted transactions ?

Communicator

Hi

I am collecting events of states of devices, a device is supposed to be go from the state "false"=beginning of transaction to "true"=end of transactions. I would like to retrieve the duration of my transactions, to track the long/unfinished ones.

My problem is that my logs doesn't always contains the end of transactions event.

Here is an example of my logs. I added comments at the end of the events. you can see that the same device/store pair may starts several transactions without finishing them all.

Thu Mar 31 10:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 1 OK start
Thu Mar 31 11:56:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 1 OK end
Thu Mar 31 12:00:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 2 broken start
Thu Mar 31 12:30:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 3 OK start
Thu Mar 31 12:40:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 3 OK end
Thu Mar 31 13:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 4 broken start
Thu Mar 31 14:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 5 broken start
Thu Mar 31 15:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 6 OK start
Thu Mar 31 16:56:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 6 OK end
Thu Mar 31 17:04:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 7 broken start
Thu Mar 31 17:10:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 8 start - unfinished yet

I would like to retrieve :

  • all the correct transactions with duration
  • every time a new transaction restart, I want to consider the previous one as finished
  • I also would like to have the duration of the last unfinished transactions till now.
1 Solution

Splunk Employee
Splunk Employee

This is hard but doable :

The difficult points are :

  • You don't have any unique transaction id
  • You are missing the end of transaction events
  • The transaction function is not enough to deal with all the invalid transactions (the startswith and the endswith cannot be to crazy)

The solution is to run 2 searches and append the result :

  • first search is a transaction keeping only the complete valid transactions (false -> true)
  • the second search is for the invalid transactions using a streamstats calculating the time interval between 2 identical device/store pair switching from false->false. (this no unique field is tempid_txn)
  • the result of the 2 searches is regrouped by append. don't forget to tune the limits of the sub-search.
  • for each group of event a unique field is created (id_txn) including the store, device and first timestamp. This field is used to dedup the transactions events existing in the 2 searches. (some false->false are simply wrong, because they may be a valid false->true in the first search)
  • then the duration of unfinished transactions is calculated to the current time.

At the end the useful fields to display are : Store, Device, duration, valid_txn, id_txn

source="*transaction.log" true OR false | eval valid_txn="valid" | eval id_txn=Device+"-"+Store+"-"+_time 
| transaction Device,Store startswith=false endswith=true keepevicted=false maxevents=2 mvlist=true 
| append maxout=100000 timeout=6000 maxtime=6000 
[ 
  search source="*transaction.log" false 
  | eval valid_txn="invalid" 
  | eval id_txn=Device+"-"+Store+"-"+_time 
  | eval tempid_txn=Device+"-"+Store 
  | streamstats range(_time) as duration window=2 by tempid_txn 
] 
| eval id_txn=mvindex(id_txn,0) 
| dedup id_txn 
| eval duration=if(duration=0,now()-_time,duration) 
| eval raw=_raw | table id_txn, Device, Store, duration, Status, closed_txn, valid_txn, raw 
| sort duration

voila !

View solution in original post

Splunk Employee
Splunk Employee

To provide another simplified example to count the duration of unfinished transactions

events :


2012-06-20 20:00:00.000 uniqueid=22 state=start
2012-06-20 21:00:00.000 uniqueid=22 state=end
2012-06-20 22:00:00.000 uniqueid=23 state=start


* | transaction uniqueid startswith="state=start" endswith="state=end" keeporphans=true
| eval duration_sec=if(isnull(duration),now()-_time,duration)
| eval status=if(closed_txn==1,"completed","still in progress")
| table _time duration_sec status _raw

We are keeping the orphean transactions and manually calculate the duration from first event to now().

Remarks :

  • duration is in seconds
  • closed_txt=1 means that the transaction has been closed, here we replace itby the text field status.
  • the duration will not work for realtime searches (the result of now() will not re-evaluated over the time, and be static)
0 Karma

Splunk Employee
Splunk Employee

This is hard but doable :

The difficult points are :

  • You don't have any unique transaction id
  • You are missing the end of transaction events
  • The transaction function is not enough to deal with all the invalid transactions (the startswith and the endswith cannot be to crazy)

The solution is to run 2 searches and append the result :

  • first search is a transaction keeping only the complete valid transactions (false -> true)
  • the second search is for the invalid transactions using a streamstats calculating the time interval between 2 identical device/store pair switching from false->false. (this no unique field is tempid_txn)
  • the result of the 2 searches is regrouped by append. don't forget to tune the limits of the sub-search.
  • for each group of event a unique field is created (id_txn) including the store, device and first timestamp. This field is used to dedup the transactions events existing in the 2 searches. (some false->false are simply wrong, because they may be a valid false->true in the first search)
  • then the duration of unfinished transactions is calculated to the current time.

At the end the useful fields to display are : Store, Device, duration, valid_txn, id_txn

source="*transaction.log" true OR false | eval valid_txn="valid" | eval id_txn=Device+"-"+Store+"-"+_time 
| transaction Device,Store startswith=false endswith=true keepevicted=false maxevents=2 mvlist=true 
| append maxout=100000 timeout=6000 maxtime=6000 
[ 
  search source="*transaction.log" false 
  | eval valid_txn="invalid" 
  | eval id_txn=Device+"-"+Store+"-"+_time 
  | eval tempid_txn=Device+"-"+Store 
  | streamstats range(_time) as duration window=2 by tempid_txn 
] 
| eval id_txn=mvindex(id_txn,0) 
| dedup id_txn 
| eval duration=if(duration=0,now()-_time,duration) 
| eval raw=_raw | table id_txn, Device, Store, duration, Status, closed_txn, valid_txn, raw 
| sort duration

voila !

View solution in original post

Splunk Employee
Splunk Employee

the valid_txn column is important to inform that the duration is valid, or based on estimations.

Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!