Splunk Search

Emulating FULL OUTER JOIN

andreafebbo
Communicator

Hi,
I have to tables:

ID name
1..A
2..B

ID Error
1..bla1
1..bla2

so Id like a table which is like the following

ID name error
1..A........bla1
1..A........bla2
2..B........(empty)

In SQL this is called FULL OUTER JOIN but i cannot find a way to replicate it since the parameter

type=outer

in the command join means LEFT JOIN in the reality.

I found some questions that try to resolve the issue with the search command but in my case it does not work.

Thank you

sandrosov_splun
Splunk Employee
Splunk Employee

The author needed a left join as mentioned before, but I see demand for this task is still high.

I spent a little bit of time on this. So, try the following.

| inputlookup tableA | eval key=localfield1
| append [|inputlookup tableB | eval key=localfield2]
| selfjoin max=0 keepsingle=yes key

 

carbdb
Explorer

absolutely great, because it is also expandable to more than two datasets to be compared. here's what i done for a "triff":

index=a | stats count by PSP_Element | rename * as a_* | rename a_PSP_Element as PSP_Element | eval a=1
| append [ | inputlookup koop_aws_spitz_zahlen PSP_Element=* | rename * as b_* | rename b_PSP_Element as PSP_Element | eval b=1 ]
| selfjoin max=0 keepsingle=yes PSP_Element

| append [ | makeresults format=csv data="PSP_Element
F-330751
F-330755
F-330758
"

| rename * as c_* | rename c_PSP_Element as PSP_Element | eval c=1 ]
| selfjoin max=0 keepsingle=yes PSP_Element

| where isnull(a) OR isnull(b) OR isnull(c)

| eval x = coalesce(b_PLAN,a_Preis) | sort - x | fields - x

| fields PSP_Element c a b *
0 Karma

tdiestel
Path Finder

This seems like you just want to do a Left join. so

index=A|table ID Name | join type=left max=0 ID [search index=B |table ID Error]

arlington
Explorer

Thanks, man, that's indeed the great answer, which helped me a lot. But what is the max=0 indicates here? Event If there is not any data in the index B, still it shows 1, without max=0 it calculates all of them as 1.

0 Karma

cmerriman
Super Champion

try something like this, maybe:

UPDATED

 index=MYINDEX source=MYSOURCE 
                   | eval Day = strftime(_time,"%F")
                   | eval DateTime = strftime(_time,"%Y-%m-%d %H:%M:%S")
                   | stats Latest(LogType) as status Latest(Result) as Result Latest(DateTime) as When by PackageName  ,ExecutionInstanceGUID, Day
                   | sort When
                   | streamstats  count as "Execution Nr" by PackageName, Day
                   | sort - When
                   | table When, PackageName, "Execution Nr",  status, Result, ExecutionInstanceGUID
                   | eval AlertLevel = case(Result=="OK",1,Result=="WARNING",2,Result=="KO",3)
                   | rangemap field=AlertLevel low=1-1 elevated=2-2 severe=3-3 default=guarded
                   | fields - AlertLevel
|appendpipe [stats count by ExecutionInstanceGUID | join type=outer ExecutionInstanceGUID [
                   search index=MYINDEX source=MYSOURCE
                   | rename ExplodedPackages{}.Error AS Error, ExplodedPackages{}.Package AS Package, ExplodedPackages{}.TimeStamp AS TimeStamp
                   | eval x=mvzip(TimeStamp,mvzip(Package,Error))
                   | mvexpand x
                   | eval y=mvzip(ExecutionInstanceGUID,x)
                   | mvexpand y
                   | eval z=split(y,",") 
                   | eval ExecutionInstanceGUID=mvindex(z,0)
                   | eval TimeStamp=mvindex(z,1)
                   | eval Package=mvindex(z,2)
                   | eval Error=mvindex(z,3)
                   | table ExecutionInstanceGUID,TimeStamp, Package, Error]]
|stats values(PackageName) as PackageName by ExecutionInstanceGUID Error
0 Karma

andreafebbo
Communicator

THe result of this query is the following:

1..A
2..B
1.......bla1
1.......bla2

and not:

1..A..bla1
1..A..bla2
2..B........

0 Karma

cmerriman
Super Champion

try my updated syntax. I used a stats command at the bottom to bring back the values of the PackageName by the ID and Error, so it should join the 1s and bla1/bla2s.

0 Karma

andreafebbo
Communicator

Nothing: now it shows just 3 columns:
ExecutionInstanceGUID
Error
PackageName (empty column).

In the example, what i call the column name(A, B) is one column but in the reality are many columns(as you can see in the query). I need all of them

0 Karma

cmerriman
Super Champion

you might need to play with the stats command at the bottom, join by any of the column names that might be in common, or do a latest instead of values.

|stats values(*) as * by ExecutionInstanceGUID 
0 Karma

cmerriman
Super Champion

can you provide syntax with how you get current tables? You might be able to use appendpipe.

0 Karma

davebrooking
Contributor

I understand your requirement is more involved, but given the basic datasets from above the following search produces similar results to that shown:

index=MYINDEX source=MYSOURCE | stats values(error) as error values(name) as name  by id | eval error=if(isnull(error),"NULL",error)  | mvexpand error | table id name error

Dave

0 Karma

andreafebbo
Communicator
index=MYINDEX source=MYSOURCE 
                  | eval Day = strftime(_time,"%F")
                  | eval DateTime = strftime(_time,"%Y-%m-%d %H:%M:%S")
                  | stats Latest(LogType) as status Latest(Result) as Result Latest(DateTime) as When by PackageName  ,ExecutionInstanceGUID, Day
                  | sort When
                  | streamstats  count as "Execution Nr" by PackageName, Day
                  | sort - When
                  | table When, PackageName, "Execution Nr",  status, Result, ExecutionInstanceGUID
                  | eval AlertLevel = case(Result=="OK",1,Result=="WARNING",2,Result=="KO",3)
                  | rangemap field=AlertLevel low=1-1 elevated=2-2 severe=3-3 default=guarded
                  | fields - AlertLevel
                  | join type=outer ExecutionInstanceGUID[
                         search index=MYINDEX source=MYSOURCE
                         | rename ExplodedPackages{}.Error AS Error, ExplodedPackages{}.Package AS Package, ExplodedPackages{}.TimeStamp AS TimeStamp
                         | eval x=mvzip(TimeStamp,mvzip(Package,Error))
                         | mvexpand x
                         | eval y=mvzip(ExecutionInstanceGUID,x)
                         | mvexpand y
                         | eval z=split(y,",") 
                         | eval ExecutionInstanceGUID=mvindex(z,0)
                         | eval TimeStamp=mvindex(z,1)
                         | eval Package=mvindex(z,2)
                         | eval Error=mvindex(z,3)
                         | table ExecutionInstanceGUID,TimeStamp, Package, Error
                  ]

With this syntax I get just one error for each row in the first table

0 Karma
Get Updates on the Splunk Community!

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Unleash Unified Security and Observability with Splunk Cloud Platform

     Now Available on Microsoft AzureThursday, March 27, 2025  |  11AM PST / 2PM EST | Register NowStep boldly ...

Splunk AppDynamics with Cisco Secure Application

Web applications unfortunately present a target rich environment for security vulnerabilities and attacks. ...