Splunk Search

Join similar fields

ViniciusMariano
Explorer

Hey guys, I'm having trouble joining two datasets with similar values

I'm trying to join two datasets, both have a common "name" field, but the one on the left has the correct value and the one on the right has this pattern: left dataset name field + some characters

e.g.:

left dataset nameright dataset name
RU3NDSRU3NDS_sdsavdg_SoKdsVI3

 

Is there any way to use a wildcard when joining?

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.)  If so, regex is appropriate.  This should work

 

| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

 

Here I use joined instead of joined_name just to make field cleanup easier.  Here is some mock data:

NAMEleft_data_varnameright_data_var
 leftbar1RU3NDS 
 leftbar1SOMETHING 
 leftbar2ELSE 
RU3NDS_abcd  rightbar1
RU3NDS_efgh  rightbar3
A_SOMETHING_abcd  rightbar2
SOMETHING_efgh  rightbar1
ELSE_bcde  rightbar2
A_RU3NDS_cdef  rightbar3

The result is

joinedleft_data_varright_data_var
ELSEleftbar2rightbar2
RU3NDSleftbar1
rightbar1
rightbar3
SOMETHINGleftbar1
rightbar1
rightbar2

It is really important to illustrate data and desired output when asking a data analytics question.  You could have saved everyone lots of time guessing.

This is a full emulation

 

| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
    [makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

 

 

View solution in original post

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

An alternative to regex is to use coalesce.  For example,

 

| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]

 

As @gcusello mentioned, if you intend to use join command, consider stats or another method instead.  For example,

 

| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS

 

Here is a complete emulation to illustrate how to correlate without using join command:

 

| makeresults format=csv data="RU3NDS, left_data_var
foo1, leftbar1
foo2, leftbar1
foo1, leftbar2
foo3, leftbar3"
| append
    [makeresults format=csv data="RU3NDS_abcd, right_data_var
foo1, rightbar1
foo2, rightbar3
foo1, rightbar2
foo3, rightbar1"]
| append 
    [makeresults format=csv data="RU3NDS_efgh, right_data_var
foo1, rightbar3
foo2, rightbar1
foo1, rightbar3
foo3, rightbar2"]
``` data emulation above ```
| foreach RU3NDS_*
    [eval RU3NDS = coalesce(RU3NDS, <<FIELD>>)]
| fields - RU3NDS_*
| stats values(*) as * dc(*) as dc_* by RU3NDS

 

RU3NDSdc_left_data_vardc_right_data_var
left_data_var
right_data_var
foo123
leftbar1
leftbar2
rightbar1
rightbar2
rightbar3
foo212leftbar1
rightbar1
rightbar3
foo312leftbar3
rightbar1
rightbar2
Tags (1)

ViniciusMariano
Explorer

My current search is something like this:

(index=1 sourcetype="x") OR (index=2 sourcetype="y" "some extra filters") "*(a name like RU3NDS just for testing but there are many like this one)*"
| eval joined_name = upper(coalesce(NAME, name)) 

NAME(upper) is from the index 1, and name(lower) from index 2

and this gave me table like this:

joined_nameothers values from index 1others values from index 2
H-RU3NDS_DAT_CDSD231_01 ...
H-RU3NDS_DAT_CDSD231_02 ...
RU3NDS... 

The first two values are from index number 2 and the third is from index number 1

And I need to join this first column as a unique value like RU3NDS, I've tried rex command too, but didn't work 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ViniciusMariano ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Sorry I didn't realize the string of interest was in value. (There was a big discussion about field name recently.)  If so, regex is appropriate.  This should work

 

| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

 

Here I use joined instead of joined_name just to make field cleanup easier.  Here is some mock data:

NAMEleft_data_varnameright_data_var
 leftbar1RU3NDS 
 leftbar1SOMETHING 
 leftbar2ELSE 
RU3NDS_abcd  rightbar1
RU3NDS_efgh  rightbar3
A_SOMETHING_abcd  rightbar2
SOMETHING_efgh  rightbar1
ELSE_bcde  rightbar2
A_RU3NDS_cdef  rightbar3

The result is

joinedleft_data_varright_data_var
ELSEleftbar2rightbar2
RU3NDSleftbar1
rightbar1
rightbar3
SOMETHINGleftbar1
rightbar1
rightbar2

It is really important to illustrate data and desired output when asking a data analytics question.  You could have saved everyone lots of time guessing.

This is a full emulation

 

| makeresults format=csv data="name, left_data_var
RU3NDS, leftbar1
SOMETHING, leftbar1
ELSE, leftbar2"
| append
    [makeresults format=csv data="NAME, right_data_var
RU3NDS_abcd, rightbar1
RU3NDS_efgh, rightbar3
A_SOMETHING_abcd, rightbar2
SOMETHING_efgh, rightbar1
ELSE_bcde, rightbar2
A_RU3NDS_cdef, rightbar3"]
``` data emulation above ```
| eventstats values(name) as left_name
| eval match_name = mvmap(left_name, mvappend(match_name, if(match(NAME, "^(.+_)*" . left_name . "_"), left_name, null())))
| eval joined = coalesce(name, match_name)
| fields - *name NAME
| stats values(*) as * by joined

 

 

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ViniciusMariano ,

if the structure of the right dataset name is always the same, e.g. :

left_dataset_name + _ + other,

you could extract the common part using a regex:

<your_search>
| rex field=right_dataset_name "^(?<left_dataset_name>[^_]+)"
| stats 
     values(right_dataset_name) AS right_dataset_name
     count 
     BY right_dataset_name
| table left_dataset_name right_dataset_name

I would suppose that you used the join concept and not the use of the join command, that should be avoided because it's very slow and expensive for the system resources.

Ciao.

Giuseppe

Get Updates on the Splunk Community!

Index This | What gets bigger the more you remove?

June 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Thanks for the Memories! Splunk University, .conf24, and Community Connections

Thank you to everyone in the Splunk Community who joined us for .conf24 – starting with Splunk University and ...