Splunk Search

Outer Join in Splunk

afurze
Engager

All,

I've been trying to find a solution for this for a few days.  We have multiple tools sending data in on their coverage and we would like to have a search that will show hosts which exist in one but not the other, in SQL terms, an OUTER JOIN.

I have found that Splunk doesn't support a true outer join, so I'm still searching for a solution.

Edit: spelling

Labels (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @afurze,

Splunk isn't a db!

it has the join command but it's better to avoid it because it's very slow!

You could have two approaches:

  1. if you haven't many events from the second host (less than 50,000) you can use the subsearch;
  2. if you have many events (more than 50,000) you have to use stats.

so if you have events in different indexes (index_A or index_B)

in the first case:

index=index_A NOT [ search index=index_B | dedup host | fields host]
| ...

In the second case:

index=index_ OR index=index_B
| stats dc(index) AS dc_index values(index) AS index BY host
| where dc_index=1 AND index=index_A

If you have your data all in the same index, you have to separate events using the sourcetype or another field.

Ciao.

Giuseppe

 

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @afurze,

Splunk isn't a db!

it has the join command but it's better to avoid it because it's very slow!

You could have two approaches:

  1. if you haven't many events from the second host (less than 50,000) you can use the subsearch;
  2. if you have many events (more than 50,000) you have to use stats.

so if you have events in different indexes (index_A or index_B)

in the first case:

index=index_A NOT [ search index=index_B | dedup host | fields host]
| ...

In the second case:

index=index_ OR index=index_B
| stats dc(index) AS dc_index values(index) AS index BY host
| where dc_index=1 AND index=index_A

If you have your data all in the same index, you have to separate events using the sourcetype or another field.

Ciao.

Giuseppe

 

afurze
Engager

This is exactly what I need, thanks!

0 Karma
Get Updates on the Splunk Community!

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...

Application management with Targeted Application Install for Victoria Experience

Experience a new era of flexibility in managing your Splunk Cloud Platform apps! With Targeted Application ...