Solved: Re: How to perform "custom" inner join?

designer46 · ‎03-11-2021

Hi everyone,

See if someone could give me a hand. My scenario is similar to this:

Table 1

ID	ID2	Whatever rest columns...
1	AA	...
2	FC	...
3	OM	...
1	BB	...
1	MQ	...

Table 2

ID	ID2	Whatever rest columns...
1	AA, BB	...
2	FC	...
3	OM	...
4	BB	...
5	MQ	...

You see that I have two identifiers. The first table we could say is a collection of logs, while the second one is the most updated inventory. What I would like to do is perform an 'inner join' so I only get the rows from Table 1 whose ID1 and ID2 exists in Table 2. Resulting in this:

ID	ID2	Whatever rest columns...
1	AA	...
2	FC	...
3	OM	...
1	BB	...

The row with ID 1 and ID2 MQ will be removed.

Thanks for your help 😄

gcusello · ‎03-11-2021

Hi @designer46,

I usualy try to do everything in Splunk because it's easier!

Anyway, if you have more than 50,000 results in table2, you cannot use solution 3 but solution 4 adding a row that I forgot in my last message:

index=index_table1 OR index=index_table2 
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| where dc_index=1 AND index=index_table1 
| table ID ID2 other_fields1

Sorry!

Ciao.

Giuseppe

View solution in original post

gcusello · ‎03-11-2021

Hi @designer46,

at first: Splunk has inner join but it's better to avoid this command because it's very slow!

Splunk isn't a DB!

There are three conditions to know:

table2 is a static table or events?
have you more THAN 50,000 events or less?
do you need infos from table2 or not?

if the table2 is a static table, you could use a lookup otherwise you have to use a search

1) if you have a static table and you don't want fields from table2:

index=index_table1 [ | inputlookup lookup_table2 | fields ID ID2 ]
| table ID ID2 other_fields1

2) if you have a static table and you need fields from table2:

index=index_table1 
| lookup lookup_table2 ID ID2 OUTPUT other_fields2
| search other_fields2=*
| table ID ID2 other_fields1 other_fields2

3) if you have a dinamic table (events) and don't want fields from table2:

index=index_table1 [ search index=index_table2 | fields ID ID2 ]
| table ID ID2 other_fields1

4) if you have a dinamic table (events) and want fields from table2:

index=index_table1 OR index=index_table2 
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| table ID ID2 other_fields1 other_fields2

If you have more than 50,000 rows in table2 there's a problem because you cannot use lookups and subsearches.

Ciao.

Giuseppe

designer46 · ‎03-11-2021

Thank you Giuseppe,

My particular case would be the third one. Table 2 is a dynamic table (events) and I don't need any fields from Table 2. However, I do have more than 50.000 records.

I am pretty much new to Splunk. Would you say that it's better that I perform this kind of operations outside of Splunk? Using other tools more appropriate?

Thank you once again.

gcusello · ‎03-11-2021

Hi @designer46,

I usualy try to do everything in Splunk because it's easier!

Anyway, if you have more than 50,000 results in table2, you cannot use solution 3 but solution 4 adding a row that I forgot in my last message:

index=index_table1 OR index=index_table2 
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| where dc_index=1 AND index=index_table1 
| table ID ID2 other_fields1

Sorry!

Ciao.

Giuseppe

gcusello · ‎03-15-2021

Hi @designer46,

good for you!

Ciao and happy splunking.

Giuseppe.

P.S.: Karma Points are appreciated 😉

How to perform "custom" inner join?

join

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms