topic How to avoid a join in a lookup search? in Splunk Search

How to avoid a join in a lookup search?

kalibaba2021 — Mon, 14 Mar 2022 13:30:20 GMT

I have 3 searches executing against same lookup, and since each lookup needs to be grouped by different set of fields, my search joins each result to the previous one.

I have a feeling this is not optimal, and want to rewrite it using stats , but don't know where to begin.

I want to create a report for Total number of Nodes per Node Type and Description, and this does what I want, but I am looking to optimize. Any ideas?

Code:

| inputlookup my_lookup | stats dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc | join type=left [| inputlookup my_lookup | stats dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes by LargeNodeDesc MidSizeNodeDesc ] | join type=left [| inputlookup my_lookup | stats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes by LargeNodeDesc ]

Re: How to avoid a join in a lookup search

bowesmana — Tue, 08 Mar 2022 22:15:24 GMT

If you need 3 splits, then you can use eventstats 3 times with a single inputlookup

| inputlookup my_lookup | eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc | eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes by LargeNodeDesc MidSizeNodeDesc | eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes by LargeNodeDesc | stats values(Ttl*) as Ttl* by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc

I'm not sure what your input and therefore expected output is intended to look like, can you give an example of what the lookup holds and what you would see with your existing query

Re: How to avoid a join in a lookup search

kalibaba2021 — Wed, 09 Mar 2022 01:04:34 GMT

Thanks for replying bowesmana,

There's actually another Compute unit I forgot named Host, but essentially with same logic. SPL is:

| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit MidComputeUnit SmallComputeUnit

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlMidSizeHost
by LargeComputeUnit MidComputeUnit
]

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

I expect to see 3 Description text columns: LargeNodeDesc MidSizeNodeDesc SmallNodeDesc , then numeric columns TtlSmallNodes , TtlMidSizeNodes, TtlLargeNodes, and similar 3 numeric columns for Hosts, grouped by their respective descriptions. Something like:

LargeComputeUnitDesc	MidComputeUnitDesc	SmallComputeUnitDesc	TotalLargeNode	TatalMidSizeNodes	TotalSmalNodes	TotalLargeHosts	TatalMidSizeHosts	TotalSmalHosts
g3.4xlarge	C5	t2.micro	9	1	12	8	8	10
H1	M4	t2.nano	7	2	14	2	7	1
D2	X1e	etc.	5	3	19	3	5	6
R5	C4	etc.	2	4	9	8	6	8
R5n	R4	etc.	3	7	7	9	7	4

Re: How to avoid a join in a lookup search

bowesmana — Wed, 09 Mar 2022 22:09:30 GMT

Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does

[| inputlookup my_lookup | stats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost by LargeComputeUnit ]

but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL

Re: How to avoid a join in a lookup search

kalibaba2021 — Thu, 10 Mar 2022 14:11:04 GMT

bowesmana,

Yes, this is sample code since I can't post the real one due to privacy concerns. And yes, in the last clause the TtlSmallHost should be TtlLargeHost, my mistake. I believe I am getting the point though, which is:

specify the fields I want to see in the report

use stats early to group my data and limit the result set for splunk to process.

Thank you for your help, I think I got the gist of it. Coming from a sql background I automatically used a join but then realized it is not optimal in this case in splunk.

Re: How to avoid a join in a lookup search

bowesmana — Mon, 14 Mar 2022 00:58:07 GMT

😀 @kalibaba2021 yes - it's always easy to spot an 'SQLer' with the liberal use of join.

There are some really good examples in this forum about avoiding join in many cases. It's almost always possible to avoid a join by using some form of stats, but it can sometimes be difficult to imagine the data flowing through the pipeline in Splunk to work out how to manipulate the data to do that join_with_stats.

Another one to watch out for and avoid if possible is 'transaction'. That has similar difficult to spot issues if you exceed limits and can often be avoided with stats.

A simple way to look at join with stats is to compare

search 1 | join id [ search 2 ]

and

search 1 OR search 2 | stats values(*) as * by id

Happy Splunking!