Hi everyone,
See if someone could give me a hand. My scenario is similar to this:
Table 1
ID | ID2 | Whatever rest columns... |
1 | AA | ... |
2 | FC | ... |
3 | OM | ... |
1 | BB | ... |
1 | MQ | ... |
Table 2
ID | ID2 | Whatever rest columns... |
1 | AA, BB | ... |
2 | FC | ... |
3 | OM | ... |
4 | BB | ... |
5 | MQ | ... |
You see that I have two identifiers. The first table we could say is a collection of logs, while the second one is the most updated inventory. What I would like to do is perform an 'inner join' so I only get the rows from Table 1 whose ID1 and ID2 exists in Table 2. Resulting in this:
ID | ID2 | Whatever rest columns... |
1 | AA | ... |
2 | FC | ... |
3 | OM | ... |
1 | BB | ... |
The row with ID 1 and ID2 MQ will be removed.
Thanks for your help 😄
Hi @designer46,
I usualy try to do everything in Splunk because it's easier!
Anyway, if you have more than 50,000 results in table2, you cannot use solution 3 but solution 4 adding a row that I forgot in my last message:
index=index_table1 OR index=index_table2
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| where dc_index=1 AND index=index_table1
| table ID ID2 other_fields1
Sorry!
Ciao.
Giuseppe
Hi @designer46,
at first: Splunk has inner join but it's better to avoid this command because it's very slow!
Splunk isn't a DB!
There are three conditions to know:
if the table2 is a static table, you could use a lookup otherwise you have to use a search
1) if you have a static table and you don't want fields from table2:
index=index_table1 [ | inputlookup lookup_table2 | fields ID ID2 ]
| table ID ID2 other_fields1
2) if you have a static table and you need fields from table2:
index=index_table1
| lookup lookup_table2 ID ID2 OUTPUT other_fields2
| search other_fields2=*
| table ID ID2 other_fields1 other_fields2
3) if you have a dinamic table (events) and don't want fields from table2:
index=index_table1 [ search index=index_table2 | fields ID ID2 ]
| table ID ID2 other_fields1
4) if you have a dinamic table (events) and want fields from table2:
index=index_table1 OR index=index_table2
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| table ID ID2 other_fields1 other_fields2
If you have more than 50,000 rows in table2 there's a problem because you cannot use lookups and subsearches.
Ciao.
Giuseppe
Thank you Giuseppe,
My particular case would be the third one. Table 2 is a dynamic table (events) and I don't need any fields from Table 2. However, I do have more than 50.000 records.
I am pretty much new to Splunk. Would you say that it's better that I perform this kind of operations outside of Splunk? Using other tools more appropriate?
Thank you once again.
Hi @designer46,
I usualy try to do everything in Splunk because it's easier!
Anyway, if you have more than 50,000 results in table2, you cannot use solution 3 but solution 4 adding a row that I forgot in my last message:
index=index_table1 OR index=index_table2
| stats dc(index) AS dc_index values(index) AS index values(other_fields1) AS other_fields1 values(other_fields2) AS other_fields2 BY ID ID2
| where dc_index=1 AND index=index_table1
| table ID ID2 other_fields1
Sorry!
Ciao.
Giuseppe