Hi!
I have two sources A and В.
Source A contains events in form of:
Id1 StartTime1 EndTime1
Id2 StartTime2 EndTime2
...
Source B contains events like:
TxId1 TxStartTime11 TxEndTime11
TxId1 TxStartTime12 TxEndTime12
TxId1 TxStartTime13 TxEndTime13
...
TxId2 TxStartTime21 TxEndTime21
TxId2 TxStartTime22 TxEndTime22
TxId2 TxStartTime23 TxEndTime23
...
Id(i) and TxId(i) are the same, i.e these are the ids of related events.
I want to find the maximum of TxEndTime in each group of source B (grouped by TxId) and then calculate the difference between max(TxEndTime) and StartTime of the corresponding event in source A (Id
The search for the start time is simple:
source=A | fields Id, StartTime
The search for finding maximum is quite easy too (omitting time conversions):
source=B | stats max(TxEndTime) as MaxTxEndTime by TxId
and it returns me a table [TxId][MaxTxEndTime].
The way to join two searches is also not so complicated (to align names of Id and TxId):
source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | eval start = strptime(StartTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(TxEndTime, "%Y-%m-%d %H:%M:%S")
But how can I put all the things together?
Is it possible to join the results of the search A with the search B to calculate the difference?
The problem is the first search returns the events while the second one returns the results of stats and I can't combine them for further calculations.
I usually tie my conditional evals directly to the source, so as long as the source is consistent the search is a little easier for others to understand. The whole search would be something like:
source=A OR source=B | eval normalizedId=if(source="A",Id,TxId) | stats max(TxEndTime) as endTime min(StartTime) as startTime by normalizedId
using your eval instead, and doing some conversion at the end:
source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | stats max(TxEndTime) as endTime min(StartTime) as startTime by UID | eval start = strptime(startTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S")
Assuming we don't need to care about aligning different ID names (Id && TxId, they are both "Id" now), I managed to change my query like this:
source=A | join Id [search source=B | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S") | stats max(end) as txEnd by Id] | eval start = strptime(StartTime, "%Y-%m-%d %H:%M:%S") | eval Duration = txEnd - start | table _time, Duration
That gives me the desired difference, but the search takes a certain time to complete on a larger amount of data. I guess, the JOIN statement is the cause, so is there any way to modify the search with OR to improve the performance?
Well, the difference between these 2 approaches is that OR adds new rows to the resulting set while JOIN adds new columns. Thus, the result after doing OR looks very similar to FULL OUTER JOIN in SQL except that even matching rows are listed separately (i.e. conjuction), which is the reason of a better search speed.
But sometimes you may need to compare fields within one resulting event and using OR doesn't help too much. In such case eventstats or transaction are good to use, or JOIN as really a last resort weapon.
Well you just shouldn't use join in this case because you don't need it (Join should be considered a weapon of last resort). it's much faster to use an OR as I did in my answer, and in addition to being a faster search it wont be subject to any hidden limits in times or number of result rows.
I usually tie my conditional evals directly to the source, so as long as the source is consistent the search is a little easier for others to understand. The whole search would be something like:
source=A OR source=B | eval normalizedId=if(source="A",Id,TxId) | stats max(TxEndTime) as endTime min(StartTime) as startTime by normalizedId
using your eval instead, and doing some conversion at the end:
source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | stats max(TxEndTime) as endTime min(StartTime) as startTime by UID | eval start = strptime(startTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S")
Yes, using OR really helps to achieve what I wanted in my original question:
(sourcetype="A" Field1=... Field2=...) OR
(sourcetype="B" Field3=... Field4=...) |
eval Uid=if(sourcetype="A",Id,TxId) |
stats min(StartTime) as start, max(TxEndTime) as end by Uid | where !isnull(start) AND !isnull(end) | eval Duration = End - Start | ...
Some additional filtering of NULL values was required as well since the resulting set contains all non-common fields in both sourcetypes as nulls.
Thanks for the hints, I mean, this elegant way of rewriting the if condition as if(source="A",Id,TxId).