I have 3 searches executing against same lookup, and since each lookup needs to be grouped by different set of fields, my search joins each result to the previous one.
I have a feeling this is not optimal, and want to rewrite it using stats , but don't know where to begin.
I want to create a report for Total number of Nodes per Node Type and Description, and this does what I want, but I am looking to optimize. Any ideas?
Code:
| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc
| join type=left
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
]
| join type=left
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
]
Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]
but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL
If you need 3 splits, then you can use eventstats 3 times with a single inputlookup
| inputlookup my_lookup
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
| stats values(Ttl*) as Ttl* by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc
I'm not sure what your input and therefore expected output is intended to look like, can you give an example of what the lookup holds and what you would see with your existing query
Thanks for replying bowesmana,
There's actually another Compute unit I forgot named Host, but essentially with same logic. SPL is:
| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit MidComputeUnit SmallComputeUnit
| join type=left
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlMidSizeHost
by LargeComputeUnit MidComputeUnit
]
| join type=left
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]
I expect to see 3 Description text columns: LargeNodeDesc MidSizeNodeDesc SmallNodeDesc , then numeric columns TtlSmallNodes , TtlMidSizeNodes, TtlLargeNodes, and similar 3 numeric columns for Hosts, grouped by their respective descriptions. Something like:
LargeComputeUnitDesc | MidComputeUnitDesc | SmallComputeUnitDesc | TotalLargeNode | TatalMidSizeNodes | TotalSmalNodes | TotalLargeHosts | TatalMidSizeHosts | TotalSmalHosts |
g3.4xlarge | C5 | t2.micro | 9 | 1 | 12 | 8 | 8 | 10 |
H1 | M4 | t2.nano | 7 | 2 | 14 | 2 | 7 | 1 |
D2 | X1e | etc. | 5 | 3 | 19 | 3 | 5 | 6 |
R5 | C4 | etc. | 2 | 4 | 9 | 8 | 6 | 8 |
R5n | R4 | etc. | 3 | 7 | 7 | 9 | 7 | 4 |
Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does
[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]
but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL
bowesmana,
Yes, this is sample code since I can't post the real one due to privacy concerns. And yes, in the last clause the TtlSmallHost should be TtlLargeHost, my mistake. I believe I am getting the point though, which is:
specify the fields I want to see in the report
use stats early to group my data and limit the result set for splunk to process.
Thank you for your help, I think I got the gist of it. Coming from a sql background I automatically used a join but then realized it is not optimal in this case in splunk.
😀 @kalibaba2021 yes - it's always easy to spot an 'SQLer' with the liberal use of join.
There are some really good examples in this forum about avoiding join in many cases. It's almost always possible to avoid a join by using some form of stats, but it can sometimes be difficult to imagine the data flowing through the pipeline in Splunk to work out how to manipulate the data to do that join_with_stats.
Another one to watch out for and avoid if possible is 'transaction'. That has similar difficult to spot issues if you exceed limits and can often be avoided with stats.
A simple way to look at join with stats is to compare
search 1
| join id [
search 2
]
and
search 1 OR search 2
| stats values(*) as * by id
Happy Splunking!