Hey guys, I'm having trouble joining two datasets with similar values
I'm trying to join two datasets, both have a common "name" field, but the one on the left has the correct value and the one on the right has this pattern: left dataset name field + some characters
e.g.:
left dataset name | right dataset name |
RU3NDS | RU3NDS_sdsavdg_SoKdsVI3 |
Is there any way to use a wildcard when joining?
Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.) If so, regex is appropriate. This should work
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined
Here I use joined instead of joined_name just to make field cleanup easier. Here is some mock data:
NAME | left_data_var | name | right_data_var |
leftbar1 | RU3NDS | ||
leftbar1 | SOMETHING | ||
leftbar2 | ELSE | ||
RU3NDS_abcd | rightbar1 | ||
RU3NDS_efgh | rightbar3 | ||
A_SOMETHING_abcd | rightbar2 | ||
SOMETHING_efgh | rightbar1 | ||
ELSE_bcde | rightbar2 | ||
A_RU3NDS_cdef | rightbar3 |
The result is
joined | left_data_var | right_data_var |
ELSE | leftbar2 | rightbar2 |
RU3NDS | leftbar1 | rightbar1 rightbar3 |
SOMETHING | leftbar1 | rightbar1 rightbar2 |
It is really important to illustrate data and desired output when asking a data analytics question. You could have saved everyone lots of time guessing.
This is a full emulation
| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
[makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined
An alternative to regex is to use coalesce. For example,
| foreach RU3NDS_*
[eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
As @gcusello mentioned, if you intend to use join command, consider stats or another method instead. For example,
| foreach RU3NDS_*
[eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS
Here is a complete emulation to illustrate how to correlate without using join command:
| makeresults format=csv data="RU3NDS, left_data_var
foo1, leftbar1
foo2, leftbar1
foo1, leftbar2
foo3, leftbar3"
| append
[makeresults format=csv data="RU3NDS_abcd, right_data_var
foo1, rightbar1
foo2, rightbar3
foo1, rightbar2
foo3, rightbar1"]
| append
[makeresults format=csv data="RU3NDS_efgh, right_data_var
foo1, rightbar3
foo2, rightbar1
foo1, rightbar3
foo3, rightbar2"]
``` data emulation above ```
| foreach RU3NDS_*
[eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS
RU3NDS | dc_left_data_var | dc_right_data_var | left_data_var | right_data_var |
foo1 | 2 | 3 | leftbar1 leftbar2 | rightbar1 rightbar2 rightbar3 |
foo2 | 1 | 2 | leftbar1 | rightbar1 rightbar3 |
foo3 | 1 | 2 | leftbar3 | rightbar1 rightbar2 |
My current search is something like this:
(index=1 sourcetype="x") OR (index=2 sourcetype="y" "some extra filters") "*(a name like RU3NDS just for testing but there are many like this one)*"
| eval joined_name = upper(coalesce(NAME, name))
NAME(upper) is from the index 1, and name(lower) from index 2
and this gave me table like this:
joined_name | others values from index 1 | others values from index 2 |
H-RU3NDS_DAT_CDSD231_01 | ... | |
H-RU3NDS_DAT_CDSD231_02 | ... | |
RU3NDS | ... |
The first two values are from index number 2 and the third is from index number 1
And I need to join this first column as a unique value like RU3NDS, I've tried rex command too, but didn't work
Hi @ViniciusMariano ,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.) If so, regex is appropriate. This should work
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined
Here I use joined instead of joined_name just to make field cleanup easier. Here is some mock data:
NAME | left_data_var | name | right_data_var |
leftbar1 | RU3NDS | ||
leftbar1 | SOMETHING | ||
leftbar2 | ELSE | ||
RU3NDS_abcd | rightbar1 | ||
RU3NDS_efgh | rightbar3 | ||
A_SOMETHING_abcd | rightbar2 | ||
SOMETHING_efgh | rightbar1 | ||
ELSE_bcde | rightbar2 | ||
A_RU3NDS_cdef | rightbar3 |
The result is
joined | left_data_var | right_data_var |
ELSE | leftbar2 | rightbar2 |
RU3NDS | leftbar1 | rightbar1 rightbar3 |
SOMETHING | leftbar1 | rightbar1 rightbar2 |
It is really important to illustrate data and desired output when asking a data analytics question. You could have saved everyone lots of time guessing.
This is a full emulation
| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
[makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined
Hi @ViniciusMariano ,
if the structure of the right dataset name is always the same, e.g. :
left_dataset_name + _ + other,
you could extract the common part using a regex:
<your_search>
| rex field=right_dataset_name "^(?<left_dataset_name>[^_]+)"
| stats
values(right_dataset_name) AS right_dataset_name
count
BY right_dataset_name
| table left_dataset_name right_dataset_name
I would suppose that you used the join concept and not the use of the join command, that should be avoided because it's very slow and expensive for the system resources.
Ciao.
Giuseppe