Hi All...
For those who already know some SQL, the join commands are pretty easy. Some of my teammates who are non-sql members, they were not aware of join, and when they try to read docs, they could not understand easily. Hence i thought to create this post for all. Thanks.
Lets take 2 simple files:
ubuntu@sekar:~$ more /tmp/names1
name=a
name=b
name=c
name=e
name=f
ubuntu@sekar:~$ more /tmp/names2
name=d
name=f
name=g
name=h
name=i
ubuntu@sekar:~$
i uploaded these 2 files and used the join command:
1. inner join example: (inner join is the default join method):
2. left join example:
3. outer join example:
Lets take 2 simple files:
ubuntu@sekar:~$ more /tmp/names1
name=a
name=b
name=c
name=e
name=f
ubuntu@sekar:~$ more /tmp/names2
name=d
name=f
name=g
name=h
name=i
ubuntu@sekar:~$
i uploaded these 2 files and used the join command:
1. inner join example: (inner join is the default join method):
2. left join example:
3. outer join example:
Accepting the above as solution..
Please reply your views, karma points 😉
Hi All,
the splunk left join and outer join - both are same ah?!?!
Descriptions for the join-options
argument
type
Syntax: type=inner | outer | left
Description: Indicates the type of join to perform. The difference between an inner and a left (or outer) join is how the events are treated in the main search that do not match any of the events in the subsearch. In both inner and left joins, events that match are joined. The results of an inner join do not include events from the main search that have no matches in the subsearch. The results of a left (or outer) join includes all of the events in the main search and only those values in the subsearch have matching field values.
Default: inner
https://docs.splunk.com/Documentation/Splunk/8.0.4/SearchReference/Join
I think both are the same.
It's worth pointing out in any Splunk discussion of join that there are some hidden pitfalls that can be hard to detect with large data sets, particularly around the default subsearch data set sizes and search time length.
I find that SQL devs coming to Splunk will always try to skin the cat with a join and then increase limits when things don't work.
The alternative commands section at the top is a good starting point and I have found it really useful to use stats as a starting point to combine multiple disparate data sets. I've generally found it faster than the join and for really large data sets, join just will not work in any reasonable time frame.
That's not to say that join doesn't have a use, but it should rarely be the go-to command for a join type operation. Working out how to do it the stats way gives you a better understand of the data/pipeline flow in SPL.