I have following data from a log file and I am trying to create a Report on Splunk site.
2014-07-08 16:09:55.966473 | orderID | b6ef3555 | status= "TRADE", seq="34521", out_time ="34527"
2014-07-08 16:09:55.966473 | orderID | b6ef3555 | status= "TRADE", seq="34522", out_time ="34528"
2014-07-08 16:09:55.966473 | orderID | b6ef3559 | status= "NEW", seq="34522", out_time="34533"
2014-07-08 16:09:55.966473 | orderID | b6ef3560 | status= "NEW", seq="34522", out_time="34535"
2014-07-08 16:09:55.966473 | orderID | b6ef3562 | status= "NEW", seq="34523", out_time="34537"
2014-07-08 16:09:55.966473 | orderID | b6ef3555 | status= "TRADE", seq="34552", out_time ="34540"
2014-07-08 16:09:55.966473 | orderID | b6ef3561 | status= "TRADE", seq="34554", out_time="34559"
2014-07-08 16:09:55.966473 | orderID | b6ef3565 | status= "NEW", seq="34554", out_time="34568"
2014-07-08 16:09:55.966473 | orderID | b6ef3567 | status= "NEW", seq="34555", out_time="34570"
I want to calculate the out_time difference for only those status='NEW" message which can be matched with Trade based on seq field
My output should look like :
OrderID,out_time_diff,out_time_New,out_time_Trade
b6ef3559,5,34533,34528
b6ef3560,7,34535,34528
b6ef3565,14,34568,4554
I can match the Trade message to New message based on "seq" field but since I can have one to many relationships between trade to new message, I am unable to calculate the differences using "transaction seq" command.
Also plz note that I might have some extra trade and extra new messages which I do not want to take into consideration.
What are the other options apart from using transaction ?
Or is there a way I can use transaction so as to reuse the data from one line to multiple other lines ?
NOTE: Edited search to fix typos and add a where condition. I didn't have access to a splunk instance to test the search out before when I posted it. You might need to adjust field names and/or capitalization to match your instance (for example: orderID vs OrderID in the eval statement on line 4 of this search.
Another approach:
... your base search here ...
eval Base_Time=if(status=="TRADE",out_time,null())
| eval New_Time=if(status=="NEW",out_time,null())
| eval New_Time_and_OrderID=mvzip(New_Time,orderID)
| stats values(New_Time_and_OrderID) as New_Time_and_OrderID values(Base_Time) as Base_Time by seq
| mvexpand New_Time_and_OrderID
| rex field=New_Time_and_OrderID "(?<New_Time>[^,]+),(?<OrderID>[^,]+)"
| eval out_time_diff=New_Time-Base_Time
| where isnotnull(Base_Time)
| table OrderID out_time_diff Base_Time New_Time
This should give you the results you are looking for. When I run this on the sample data you provided, it gives the following results:
OrderID out_time_diff Base_Time New_Time
b6ef3559 5 34528 34533
b6ef3560 7 34528 34535
b6ef3565 9 34559 34568
OrderID's "b6ef3559" and "b6ef3560" are both "NEW" records, and both have the same sequence number (34522) corresponding to the sequence number of a single "TRADE" record. How does this work for you?
I've updated my original answer to address your comment, fix typos, and fix my regex. Please try it now and see if it works for you.
this doesnt work when I have one "trade" and 2 "new" data
Also I might have more new and trade messages which wont match up based on "seq" field, I only want to display the NEW and TRADE data which matches based on "seq" field
You should be able to use something along these lines:
base search | eventstats first(eval(case(status=="TRADE",out_time))) as out_time_trade by seq | search status="NEW" | eval out_time_diff = out_time - out_time_trade | table OrderID out_time_diff out_time out_time_trade
The eventstats
looks for the TRADE
event for each value of seq
and copies its out_time
into the NEW
events.
You can filter for that like this at the start of the second line:
... | search status="NEW" out_time_trade=* | ...
That way a NEW event will only be considered if there is a matching TRADE event for that seq value.
Actually I might have more new and trade messages which wont match up based on "seq" field, I only want to display the NEW and TRADE data which matches based on "seq" field