Splunk Search

can the join command only give me inner joins and left outer joins?

SplunkTrust
SplunkTrust

I actually need a right join in some cases.

I know im not supposed to use joins at all, and wherever possible use a disjunction plus stats, or use a lookup, because these are faster, better, cheaper, awesomer etc.
(http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717)

However sometimes there's just no other way. One side of my data comes from a search and the other side comes from inputlookup. So i cant just glue together two sets of 'events' in a single search with some OR's and stitch them back together with stats count by foo later. .

Anyway, proceeding with join, I have 2 searches that return events with a field called sourceHost

search <somewhat expensive search>
inputlookup foo.csv

Following best practices with join, the cheaper smaller search goes inside the brackets:

search <somewhat expensive search> | join common_field [inputlookup foo.csv]

"inner" is the default type, so if rows are in left side and not in the right side, or in the right side and not on the left side, they'll be dropped.

the docs say that type="outer" and type="left" are synonymous. http://www.splunk.com/base/Documentation/latest/SearchReference/Join

and as far as I can tell there's no type="right". Is there another way?

Tags (2)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

One workaround is to use the append command and then much like the disjunction trick (with the OR's) you glue the two sets back together with a stats. Say you have a field called 'lastConnected' in your lookup table, and some stats coming back in your outer search like sum(kb) you can do this.

 search foo | stats sum(kb) as KB by sourceHost | append [inputlookup foo.csv] | stats first(KB) as KB first(lastConnected) as lastConnected by sourceHost

View solution in original post

Splunk Employee
Splunk Employee

Yeah. technically our "outer" is "left outer", not "right outer". There is also no "full".

0 Karma

SplunkTrust
SplunkTrust

One workaround is to use the append command and then much like the disjunction trick (with the OR's) you glue the two sets back together with a stats. Say you have a field called 'lastConnected' in your lookup table, and some stats coming back in your outer search like sum(kb) you can do this.

 search foo | stats sum(kb) as KB by sourceHost | append [inputlookup foo.csv] | stats first(KB) as KB first(lastConnected) as lastConnected by sourceHost

View solution in original post

SplunkTrust
SplunkTrust

oh nice! yea that's much better. Were you to write it up as an answer I'd probably pick it and delete/merge my own. 😃

0 Karma

Splunk Employee
Splunk Employee

You can also avoid using a subsearch entirely in this case (along with it's attendant size and time limits etc) by using the append=t option on | inputlookup instead: search foo | stats sum(kb) as KB by sourceHost | inputlookup append=t foo.csv | stats first(KB) lastConnected by sourceHost

0 Karma