Splunk Search

Full outer join

smolcj
Builder

How can I meet full outer join requirement in my search??
table a and table b with only one filed in two rows are same i want to display all rows from both the tables:
like , table a with 5 rows
and table b with 6 rows both with a common field thredname
and thread name is same in 2 rows. when i am using join type=outer i am getting 6 rows including the details from table a.
but my need is like i want 9 table where rows with common Threadname will have fully filled fields and other may have their own data.
example

Table a 
 Threadname  adata     
a1          aaa
a2          aa
a3          aa
ab1         aabb     
ab2         ab     



Table b

Threadname  bdata 
ab1         aabbb
ab2         aabbb
b1          bb
b2          bbb
b3          bbb
b4          bbb

Expected Resultant Table
Threadname  adata     bdata  
a1          aaa
a2          aa
a3          aa
ab1         aabb      aabbb
ab2         ab        aabbb
b1                    bb
b2                    bbb
b3                    bbb
b4                    bbb

please help
thank you

Tags (1)
1 Solution

sideview
SplunkTrust
SplunkTrust

The join command's outer join is actually a "left outer" join technically.

To do what you want, you should either

1) don't use join at all, but use append and then stats

index=main source=file1 | stats count by Threadname adata | append [ search index=main source=file2 | stats count by Threadname bdata] | stats values(adata) as adata values(bdata) as bdata by Threadname

2) better, do it in a single search pipeline.

index=main (source=file1 OR source=file2) | stats values(adata) as adata values(bdata) as bdata by Threadname

I understand what you're saying about having to extract fields differently, and do different search language for one side versus the other, but too often people assume there's no way to do such things in a single pipeline. I recommend posting the full details of the searches in a separate question, asking whether you can get the desired end result without using join or append.

View solution in original post

sideview
SplunkTrust
SplunkTrust

The join command's outer join is actually a "left outer" join technically.

To do what you want, you should either

1) don't use join at all, but use append and then stats

index=main source=file1 | stats count by Threadname adata | append [ search index=main source=file2 | stats count by Threadname bdata] | stats values(adata) as adata values(bdata) as bdata by Threadname

2) better, do it in a single search pipeline.

index=main (source=file1 OR source=file2) | stats values(adata) as adata values(bdata) as bdata by Threadname

I understand what you're saying about having to extract fields differently, and do different search language for one side versus the other, but too often people assume there's no way to do such things in a single pipeline. I recommend posting the full details of the searches in a separate question, asking whether you can get the desired end result without using join or append.

nick405060
Motivator

Also, some people looking at this solution may want to be using stats list not stats values here, depending on your outer join use case

0 Karma

nick405060
Motivator

Why are people complacent about outer join functionality? Shouldn't we be questioning the SPL development skills here? Why do they even provide you with the outer option, if it's just going to be a left outer join? I would logically assume that the word "outer" means "full outer" here, especially considering I know there is already a command for an outer left join...

I understand why stats is better than join but why even offer the outer option with join...?

0 Karma

jaxjohnny
Path Finder

This worked for me also! Thank you!

0 Karma

mad4wknds
Path Finder

Worked like a charm

0 Karma

davebrooking
Contributor

I think this is needs improving, but it does produce the results you've described.

I created a sourcetype of joinq that has 2 sources adata.txt and bdata.txt

The following search yields the table with 9 rows.

sourcetype=joinq source="adata.txt" | join threadname [search sourcetype=joinq source!="adata.txt"] | append [search sourcetype=joinq source="adata.txt" | join threadname type=outer [search sourcetype=joinq source!="adata.txt"]] | append [search sourcetype=joinq source!="adata.txt" | join threadname type=outer [search sourcetype=joinq source="adata.txt"]] | table threadname adata bdata | dedup threadname | sort threadname

Dave

0 Karma

kristian_kolb
Ultra Champion

Hmm, not sure I fully understand, but you might not require a join, but that depends on how you get your data...

If the 'tables' you mention are not tables, but sourcetypes, each row could be seen as an event, with adata and bdata being fields. Then something like;

... | stats first(adata) as adata first(bdata) as bdata by Threadname

would be something that could work.

You should probably give us a few sample events or a better description of your data.

/K

jaxjohnny
Path Finder

This worked for me . Thank you

0 Karma

jaxjohnny
Path Finder

This worked for me also. Thank you

0 Karma

smolcj
Builder

mmm i have to use a join for this search because i want to extract same field in two different column names so that i can differentiate it as data from different files so i need a join method that can provide full outer join values.. please help 😞

0 Karma

kristian_kolb
Ultra Champion

So.. then you could try:

index=main source=file1 OR source=file2 | stats first(adata) as adata first(bdata) as bdata by Threadname

assuming that there can be only one (or less) adata per Threadname, and only one (or less) bdata per Threadname.

smolcj
Builder

they are the values extracted from log files . Threadname adata and btada are the fields extracted from 2 different log files. table a from file1 and table b from file 2..
search may look like
index=main sorce=file1 |top Threadname adata | join type=outer Threadname [ search index=main sorce=file2 |top Threadname bdata]
but i am not able to obtain full outer join the output looks like
Threadname adata bdata

a1 aaa
a2 aa
a3 aa
ab1 aabb aabbb
ab2 ab aabbb

0 Karma

kristian_kolb
Ultra Champion

Thanks for that, but I was more interested in where the data comes from. Are they csv files, or the output of some splunk reporting command, or in separate events (i.e. table_a in one event, and table_b in another)?

/k

0 Karma

smolcj
Builder

Sorry if i confused you. edited the question. kindly please check it !!

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...