Solved: Join similar fields

ViniciusMariano · ‎05-20-2024

Hey guys, I'm having trouble joining two datasets with similar values

I'm trying to join two datasets, both have a common "name" field, but the one on the left has the correct value and the one on the right has this pattern: left dataset name field + some characters

e.g.:

left dataset name	right dataset name
RU3NDS	RU3NDS_sdsavdg_SoKdsVI3

Is there any way to use a wildcard when joining?

yuanliu · ‎05-21-2024

Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.) If so, regex is appropriate. This should work

| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

Here I use joined instead of joined_name just to make field cleanup easier. Here is some mock data:

NAME	left_data_var	name	right_data_var
	leftbar1	RU3NDS
	leftbar1	SOMETHING
	leftbar2	ELSE
RU3NDS_abcd			rightbar1
RU3NDS_efgh			rightbar3
A_SOMETHING_abcd			rightbar2
SOMETHING_efgh			rightbar1
ELSE_bcde			rightbar2
A_RU3NDS_cdef			rightbar3

The result is

joined	left_data_var	right_data_var
ELSE	leftbar2	rightbar2
RU3NDS	leftbar1	rightbar1 rightbar3
SOMETHING	leftbar1	rightbar1 rightbar2

It is really important to illustrate data and desired output when asking a data analytics question. You could have saved everyone lots of time guessing.

This is a full emulation

| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
    [makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

View solution in original post

yuanliu · ‎05-20-2024

An alternative to regex is to use coalesce. For example,

| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]

As @gcusello mentioned, if you intend to use join command, consider stats or another method instead. For example,

| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS

Here is a complete emulation to illustrate how to correlate without using join command:

| makeresults format=csv data="RU3NDS, left_data_var
foo1, leftbar1
foo2, leftbar1
foo1, leftbar2
foo3, leftbar3"
| append
    [makeresults format=csv data="RU3NDS_abcd, right_data_var
foo1, rightbar1
foo2, rightbar3
foo1, rightbar2
foo3, rightbar1"]
| append 
    [makeresults format=csv data="RU3NDS_efgh, right_data_var
foo1, rightbar3
foo2, rightbar1
foo1, rightbar3
foo3, rightbar2"]
``` data emulation above ```
| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS

RU3NDS	dc_left_data_var	dc_right_data_var	left_data_var	right_data_var
foo1	2	3	leftbar1 leftbar2	rightbar1 rightbar2 rightbar3
foo2	1	2	leftbar1	rightbar1 rightbar3
foo3	1	2	leftbar3	rightbar1 rightbar2

ViniciusMariano · ‎05-21-2024

My current search is something like this:

(index=1 sourcetype="x") OR (index=2 sourcetype="y" "some extra filters") "*(a name like RU3NDS just for testing but there are many like this one)*"
| eval joined_name = upper(coalesce(NAME, name))

NAME(upper) is from the index 1, and name(lower) from index 2

and this gave me table like this:

joined_name	others values from index 1	others values from index 2
H-RU3NDS_DAT_CDSD231_01		...
H-RU3NDS_DAT_CDSD231_02		...
RU3NDS	...

The first two values are from index number 2 and the third is from index number 1

And I need to join this first column as a unique value like RU3NDS, I've tried rex command too, but didn't work

gcusello · ‎05-22-2024

Hi @ViniciusMariano ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

yuanliu · ‎05-21-2024

Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.) If so, regex is appropriate. This should work

| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

Here I use joined instead of joined_name just to make field cleanup easier. Here is some mock data:

NAME	left_data_var	name	right_data_var
	leftbar1	RU3NDS
	leftbar1	SOMETHING
	leftbar2	ELSE
RU3NDS_abcd			rightbar1
RU3NDS_efgh			rightbar3
A_SOMETHING_abcd			rightbar2
SOMETHING_efgh			rightbar1
ELSE_bcde			rightbar2
A_RU3NDS_cdef			rightbar3

The result is

joined	left_data_var	right_data_var
ELSE	leftbar2	rightbar2
RU3NDS	leftbar1	rightbar1 rightbar3
SOMETHING	leftbar1	rightbar1 rightbar2

It is really important to illustrate data and desired output when asking a data analytics question. You could have saved everyone lots of time guessing.

This is a full emulation

| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
    [makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

gcusello · ‎05-20-2024

Hi @ViniciusMariano ,

if the structure of the right dataset name is always the same, e.g. :

left_dataset_name + _ + other,

you could extract the common part using a regex:

<your_search>
| rex field=right_dataset_name "^(?<left_dataset_name>[^_]+)"
| stats 
     values(right_dataset_name) AS right_dataset_name
     count 
     BY right_dataset_name
| table left_dataset_name right_dataset_name

I would suppose that you used the join concept and not the use of the join command, that should be avoided because it's very slow and expensive for the system resources.

Ciao.

Giuseppe

Join similar fields

join

stats

subsearch

Index This | What gets bigger the more you remove?

Introducing the 2024 Splunk MVPs!

Thanks for the Memories! Splunk University, .conf24, and Community Connections