I actually need a right join in some cases.
I know im not supposed to use joins at all, and wherever possible use a disjunction plus stats, or use a lookup, because these are faster, better, cheaper, awesomer etc.
(http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717)
However sometimes there's just no other way. One side of my data comes from a search and the other side comes from inputlookup. So i cant just glue together two sets of 'events' in a single search with some OR's and stitch them back together with stats count by foo
later. .
Anyway, proceeding with join, I have 2 searches that return events with a field called sourceHost
search <somewhat expensive search>
inputlookup foo.csv
Following best practices with join, the cheaper smaller search goes inside the brackets:
search <somewhat expensive search> | join common_field [inputlookup foo.csv]
"inner" is the default type, so if rows are in left side and not in the right side, or in the right side and not on the left side, they'll be dropped.
the docs say that type="outer" and type="left" are synonymous. http://www.splunk.com/base/Documentation/latest/SearchReference/Join
and as far as I can tell there's no type="right". Is there another way?
One workaround is to use the append
command and then much like the disjunction trick (with the OR's) you glue the two sets back together with a stats
. Say you have a field called 'lastConnected' in your lookup table, and some stats coming back in your outer search like sum(kb) you can do this.
search foo | stats sum(kb) as KB by sourceHost | append [inputlookup foo.csv] | stats first(KB) as KB first(lastConnected) as lastConnected by sourceHost
Yeah. technically our "outer" is "left outer", not "right outer". There is also no "full".
One workaround is to use the append
command and then much like the disjunction trick (with the OR's) you glue the two sets back together with a stats
. Say you have a field called 'lastConnected' in your lookup table, and some stats coming back in your outer search like sum(kb) you can do this.
search foo | stats sum(kb) as KB by sourceHost | append [inputlookup foo.csv] | stats first(KB) as KB first(lastConnected) as lastConnected by sourceHost
oh nice! yea that's much better. Were you to write it up as an answer I'd probably pick it and delete/merge my own. 😃
You can also avoid using a subsearch entirely in this case (along with it's attendant size and time limits etc) by using the append=t
option on | inputlookup
instead: search foo | stats sum(kb) as KB by sourceHost | inputlookup append=t foo.csv | stats first(KB) lastConnected by sourceHost