Solved: How to avoid a join in a lookup search?

kalibaba2021 · ‎03-08-2022

I have 3 searches executing against same lookup, and since each lookup needs to be grouped by different set of fields, my search joins each result to the previous one.

I have a feeling this is not optimal, and want to rewrite it using stats , but don't know where to begin.

I want to create a report for Total number of Nodes per Node Type and Description, and this does what I want, but I am looking to optimize. Any ideas?

Code:

| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
]

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
]

bowesmana · ‎03-09-2022

Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL

View solution in original post

bowesmana · ‎03-08-2022

If you need 3 splits, then you can use eventstats 3 times with a single inputlookup

| inputlookup my_lookup
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
| stats values(Ttl*) as Ttl* by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc

I'm not sure what your input and therefore expected output is intended to look like, can you give an example of what the lookup holds and what you would see with your existing query

kalibaba2021 · ‎03-08-2022

Thanks for replying bowesmana,

There's actually another Compute unit I forgot named Host, but essentially with same logic. SPL is:

| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit MidComputeUnit SmallComputeUnit

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlMidSizeHost
by LargeComputeUnit MidComputeUnit
]

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

I expect to see 3 Description text columns: LargeNodeDesc MidSizeNodeDesc SmallNodeDesc , then numeric columns TtlSmallNodes , TtlMidSizeNodes, TtlLargeNodes, and similar 3 numeric columns for Hosts, grouped by their respective descriptions. Something like:

LargeComputeUnitDesc	MidComputeUnitDesc	SmallComputeUnitDesc	TotalLargeNode	TatalMidSizeNodes	TotalSmalNodes	TotalLargeHosts	TatalMidSizeHosts	TotalSmalHosts
g3.4xlarge	C5	t2.micro	9	1	12	8	8	10
H1	M4	t2.nano	7	2	14	2	7	1
D2	X1e	etc.	5	3	19	3	5	6
R5	C4	etc.	2	4	9	8	6	8
R5n	R4	etc.	3	7	7	9	7	4

bowesmana · ‎03-09-2022

Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL

kalibaba2021 · ‎03-10-2022

bowesmana,

Yes, this is sample code since I can't post the real one due to privacy concerns. And yes, in the last clause the TtlSmallHost should be TtlLargeHost, my mistake. I believe I am getting the point though, which is:

specify the fields I want to see in the report

use stats early to group my data and limit the result set for splunk to process.

Thank you for your help, I think I got the gist of it. Coming from a sql background I automatically used a join but then realized it is not optimal in this case in splunk.

bowesmana · ‎03-13-2022

😀 @kalibaba2021 yes - it's always easy to spot an 'SQLer' with the liberal use of join.

There are some really good examples in this forum about avoiding join in many cases. It's almost always possible to avoid a join by using some form of stats, but it can sometimes be difficult to imagine the data flowing through the pipeline in Splunk to work out how to manipulate the data to do that join_with_stats.

Another one to watch out for and avoid if possible is 'transaction'. That has similar difficult to spot issues if you exceed limits and can often be avoided with stats.

A simple way to look at join with stats is to compare

search 1
| join id [
  search 2
]

and

search 1 OR search 2
| stats values(*) as * by id

Happy Splunking!

How to avoid a join in a lookup search?

join

lookup

stats

tstats

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation