Splunk Search

Joining two searches with and without stats command

greg
Communicator

Hi!

I have two sources A and В.

Source A contains events in form of:

Id1 StartTime1 EndTime1

Id2 StartTime2 EndTime2

...

Source B contains events like:

TxId1 TxStartTime11 TxEndTime11

TxId1 TxStartTime12 TxEndTime12

TxId1 TxStartTime13 TxEndTime13

...

TxId2 TxStartTime21 TxEndTime21

TxId2 TxStartTime22 TxEndTime22

TxId2 TxStartTime23 TxEndTime23

...

Id(i) and TxId(i) are the same, i.e these are the ids of related events.

I want to find the maximum of TxEndTime in each group of source B (grouped by TxId) and then calculate the difference between max(TxEndTime) and StartTime of the corresponding event in source A (Id = TxId).

The search for the start time is simple:

source=A | fields Id, StartTime

The search for finding maximum is quite easy too (omitting time conversions):

source=B | stats max(TxEndTime) as MaxTxEndTime by TxId

and it returns me a table [TxId][MaxTxEndTime].

The way to join two searches is also not so complicated (to align names of Id and TxId):

source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | eval start = strptime(StartTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(TxEndTime, "%Y-%m-%d %H:%M:%S") 

But how can I put all the things together?

Is it possible to join the results of the search A with the search B to calculate the difference?
The problem is the first search returns the events while the second one returns the results of stats and I can't combine them for further calculations.

Tags (3)
1 Solution

sideview
SplunkTrust
SplunkTrust

I usually tie my conditional evals directly to the source, so as long as the source is consistent the search is a little easier for others to understand. The whole search would be something like:

source=A OR source=B | eval normalizedId=if(source="A",Id,TxId) | stats max(TxEndTime) as endTime min(StartTime) as startTime by normalizedId

using your eval instead, and doing some conversion at the end:

source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | stats max(TxEndTime) as endTime min(StartTime) as startTime by UID | eval start = strptime(startTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S")

View solution in original post

greg
Communicator

Assuming we don't need to care about aligning different ID names (Id && TxId, they are both "Id" now), I managed to change my query like this:

source=A | join Id [search source=B | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S") | stats max(end) as txEnd by Id] | eval start = strptime(StartTime, "%Y-%m-%d %H:%M:%S") | eval Duration = txEnd - start | table _time, Duration 

That gives me the desired difference, but the search takes a certain time to complete on a larger amount of data. I guess, the JOIN statement is the cause, so is there any way to modify the search with OR to improve the performance?

greg
Communicator

Well, the difference between these 2 approaches is that OR adds new rows to the resulting set while JOIN adds new columns. Thus, the result after doing OR looks very similar to FULL OUTER JOIN in SQL except that even matching rows are listed separately (i.e. conjuction), which is the reason of a better search speed.

But sometimes you may need to compare fields within one resulting event and using OR doesn't help too much. In such case eventstats or transaction are good to use, or JOIN as really a last resort weapon.

0 Karma

sideview
SplunkTrust
SplunkTrust

Well you just shouldn't use join in this case because you don't need it (Join should be considered a weapon of last resort). it's much faster to use an OR as I did in my answer, and in addition to being a faster search it wont be subject to any hidden limits in times or number of result rows.

0 Karma

sideview
SplunkTrust
SplunkTrust

I usually tie my conditional evals directly to the source, so as long as the source is consistent the search is a little easier for others to understand. The whole search would be something like:

source=A OR source=B | eval normalizedId=if(source="A",Id,TxId) | stats max(TxEndTime) as endTime min(StartTime) as startTime by normalizedId

using your eval instead, and doing some conversion at the end:

source=A OR source=B | eval UID=if(len(Id)>0,Id,if(len(TxID)>0,TxID,"")) | stats max(TxEndTime) as endTime min(StartTime) as startTime by UID | eval start = strptime(startTime, "%Y-%m-%d %H:%M:%S") | eval end = strptime(endTime, "%Y-%m-%d %H:%M:%S")

View solution in original post

greg
Communicator

Yes, using OR really helps to achieve what I wanted in my original question:
(sourcetype="A" Field1=... Field2=...) OR
(sourcetype="B" Field3=... Field4=...) |
eval Uid=if(sourcetype="A",Id,TxId) |
stats min(StartTime) as start, max(TxEndTime) as end by Uid | where !isnull(start) AND !isnull(end) | eval Duration = End - Start | ...

Some additional filtering of NULL values was required as well since the resulting set contains all non-common fields in both sourcetypes as nulls.

0 Karma

greg
Communicator

Thanks for the hints, I mean, this elegant way of rewriting the if condition as if(source="A",Id,TxId).

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!