Splunk Search

How to avoid a join in a lookup search?

kalibaba2021
Path Finder

I have 3 searches executing against same lookup, and since each lookup needs to be grouped by different set of fields, my search joins each result to the previous one.

I have a feeling this is not optimal, and want to rewrite it using stats , but don't know where to begin.


I want to create a report for Total number of Nodes per Node Type and Description, and this does what I want, but I am looking to optimize. Any ideas?     

Code:

| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
]

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
]

 

 

Labels (4)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

 but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL

 

View solution in original post

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you need 3 splits, then you can use eventstats 3 times with a single inputlookup

| inputlookup my_lookup
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
by LargeNodeDesc MidSizeNodeDesc
| eventstats dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
by LargeNodeDesc
| stats values(Ttl*) as Ttl* by LargeNodeDesc MidSizeNodeDesc SmallNodeDesc

I'm not sure what your input and therefore expected output is intended to look like, can you give an example of what the lookup holds and what you would see with your existing query

0 Karma

kalibaba2021
Path Finder

Thanks for replying bowesmana,

There's actually another Compute unit I forgot named Host, but essentially with same logic.  SPL is:

| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlSmallNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit MidComputeUnit SmallComputeUnit

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlMidSizeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlMidSizeHost
by LargeComputeUnit MidComputeUnit
]

| join type=left

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

I expect to see 3 Description text columns: LargeNodeDesc MidSizeNodeDesc SmallNodeDesc , then numeric columns TtlSmallNodes , TtlMidSizeNodes, TtlLargeNodes, and similar 3 numeric columns for Hosts, grouped by their respective descriptions. Something like: 

LargeComputeUnitDesc

MidComputeUnitDesc

SmallComputeUnitDesc

TotalLargeNode

TatalMidSizeNodes

TotalSmalNodes

TotalLargeHosts

TatalMidSizeHosts

TotalSmalHosts

g3.4xlarge

C5

t2.micro

9

1

12

8

8

10

H1

M4

t2.nano

7

2

14

2

7

1

D2

X1e

etc.

5

3

19

3

5

6

R5

C4

etc.

2

4

9

8

6

8

R5n

R4

etc.

3

7

7

9

7

4

Tags (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Can you post some of your lookup file - The NodeType='A' and 'B" stuff is confusing and the SPL doesn't seem to be correct, e.g. the last part of the SPL does

[| inputlookup my_lookup
| stats
dc(eval(if(NodeType="A",NodeID,null()))) as TtlLargeNodes
dc(eval(if(HostType="B",HostD,null()))) as TtlSmallHost
by LargeComputeUnit
]

 but I suspect that the TtlSmallHost should be TtlLargeHost, so this seems to be pseudo code, not the real SPL

 

0 Karma

kalibaba2021
Path Finder

bowesmana,

Yes, this is sample code since I can't post the real one due to privacy concerns.  And yes, in the last clause the TtlSmallHost should be TtlLargeHost, my mistake.  I believe I am getting the point though, which is:

 specify the fields I want to see in the report 

use stats early to group my data and limit the result set for splunk to process.

Thank you for your help, I think I got the gist of it. Coming from a sql background I automatically used a join but then realized it is not optimal in this case in splunk.

bowesmana
SplunkTrust
SplunkTrust

😀 @kalibaba2021 yes - it's always easy to spot an 'SQLer' with the liberal use of join.

There are some really good examples in this forum about avoiding join in many cases. It's almost always possible to avoid a join by using some form of stats, but it can sometimes be difficult to imagine the data flowing through the pipeline in Splunk to work out how to manipulate the data to do that join_with_stats.

Another one to watch out for and avoid if possible is 'transaction'. That has similar difficult to spot issues if you exceed limits and can often be avoided with stats.

A simple way to look at join with stats is to compare

search 1
| join id [
  search 2
]

and

search 1 OR search 2
| stats values(*) as * by id

Happy Splunking!

 

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...