- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Left Outer Join in Splunk
Below is our Requirement
Lookup file has just one column DatabaseName, this is the left dataset
DatabaseName |
A |
B |
C |
My Search is for metrics on databases and ha
s multiple rows, this is the right dataset
DatabaseName | Instance | CPUUtilization |
A | A1 | 10 |
A | A2 | 20 |
C | C1 | 40 |
C | C2 | 50 |
D | D | 60 |
Expected Result is this after left join
DatabaseName | Instance | CPUUtilization |
A | A1 | 10 |
A | A2 | 20 |
B | NULL | NULL |
C | C1 | 40 |
C | C2 | 50 |
But when I join using DatabaseName, I am getting only three records, 1 for A, 1 for B with NULL and 1 for C
My background is SQL and for me left join is all from left data set and all matching from right data set. So please suggest me how I can achive this.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi
one old answer which describe how joins can/should do with splunk https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi...
r. Ismo
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Something like this should work. I called the lookup db_names.csv ... change that to whatever your actual lookup is named. Everything above the comment just emulates the data you gave.
| makeresults count=1
| eval _raw="DatabaseName,Instance,CPUUtilization
A,A1,10
A,A2,20
C,C1,40
C,C2,50
D,D,60"
| multikv forceheader=1
| fields - _time, _raw, linecount
```^^^^ This emulates the data you gave ^^^^```
| eval inst_cpu=Instance+"#"+CPUUtilization
| fields - Instance CPUUtilization
| inputlookup db_names.csv append=true ```<-- change the lookup name here```
| stats list(inst_cpu) as inst_cpu by DatabaseName
| mvexpand inst_cpu
| eval Instance=mvindex(split(inst_cpu,"#"), 0)
| eval CPUUtilization=mvindex(split(inst_cpu,"#"), 1)
| fillnull value="NULL" Instance CPUUtilization
| fields - inst_cpu
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


The join command is an inefficient way to combine datasets. Alternative commands are described in the Search Reference manual (https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/Join#Alternative_commands).
Splunk has a manual for SQL users. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/SQLtoSplunk
If this reply helps you, Karma would be appreciated.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

For your requirement, left join may not be ideal. Try this alternate implementation (replace make results query with your lookup/data query):
| makeresults | eval DatabaseName=split("A B C"," ") | mvexpand DatabaseName | table DatabaseName | eval from="Lookup" | append [| makeresults | eval DatabaseName=split("A A1 10#A A2 20#C C1 40#C C2 50#D D 60","#") | mvexpand DatabaseName | table DatabaseName | rex field=DatabaseName "^(?<DatabaseName>\S+)\s+(?<Instance>\S+)\s+(?<CPUUtilization>\S+)$" | eval from="search"] | eventstats values(from) as from by DatabaseName | where isnotnull(mvfilter(match(from,"Lookup"))) | foreach CPUUtilization Instance [| eval "<<FIELD>>"=coalesce('<<FIELD>>',if(mvcount(from)=1 AND from="Lookup","NULL",null()))] | stats count by DatabaseName CPUUtilization Instance | table DatabaseName CPUUtilization Instance
