I have logs from two apps to analyze. General a session of app interaction (as it is represented in logs) looks like this:
Each message/response gets a new ID. Each response also has its corresponding request message ID.
Log file of App1 consists of many chunks like:
[timestamp]|SEND1|[XXX7]
[timestamp]|RECV2|[XXX8]
[timestamp]|SEND3|[XXX9]|[XXX8]
Log file of App2 consists of many chunks like:
[timestamp]|RECV1|[XXX7]
[timestamp]|SEND2|[XXX8]|[XXX7]
Where [XXXX] is some random message ID.
Apps are asynchronous, so log records from few sessions can be mixed.
So if you combine both logs, logically group them by message ID's and sort them by timestamps, you'll get something like this:
log from App1 log from App2
[timestamp]|SEND1|[XXX7]
[timestamp]|RECV1|[XXX7]
[timestamp]|SEND2|[XXX8]|[XXX7]
[timestamp]|RECV2|[XXX8]
[timestamp]|SEND3|[XXX9]|[XXX8]
Is there a way to get statistics on average(by the second/minute/hour) delays between each pair of records?
Assuming that MessageID is a multivalued field
(if not, then do what you need to do to make sure that it is), then you can do it like this:
sourcetype=App1 OR sourcetype=App2 | transaction MessageID mvlist=_time | streamstats current=t count AS serial | mvexpand _time | streamstats current=f last(_time) AS prevTime by serial | eval delta=_time-prevTime | stats avg(delta)
Assuming that MessageID is a multivalued field
(if not, then do what you need to do to make sure that it is), then you can do it like this:
sourcetype=App1 OR sourcetype=App2 | transaction MessageID mvlist=_time | streamstats current=t count AS serial | mvexpand _time | streamstats current=f last(_time) AS prevTime by serial | eval delta=_time-prevTime | stats avg(delta)
I used something like this:
sourcetype=App1 OR sourcetype=App2 | eval code=_time+","+OperationCode | makemv delim="|" MessageID | transaction MessageID maxevents=5 | mvexpand code | rex field=code "(?<_time>\d+\.\d+),(?<OperationCode>\w+\d+)" | streamstats current=f last(_time) AS prevTime by MessageID | eval delta=_time-prevTime | stats avg(delta)
I didn't get why to use "streamstats current=t count AS serial". I hade to use "code" with mvexpand insted of "_time" because a transaction record has only one value for a _time field.
Your adjustments seem to be entirely appropriate, especially since they achieved the desired results.