I've seen a lot about not using join subsearches, how it's slow, etc etc. Which proves to be true in practice.
What I would like to find out is why it is slow. Any insight here would be helpful.
Take a look at these two articles, specifically the posts by @daljeanis :
https://answers.splunk.com/answers/561130/how-to-join-two-tables-where-the-key-is-named-diff.html
https://answers.splunk.com/answers/660008/which-is-the-best-approach-to-join-two-database-ta.html
The problem is that join
is an SQL concept, and Splunk is not a relational database. The command exists (and works), but its very often not the best approach
Take a look at these two articles, specifically the posts by @daljeanis :
https://answers.splunk.com/answers/561130/how-to-join-two-tables-where-the-key-is-named-diff.html
https://answers.splunk.com/answers/660008/which-is-the-best-approach-to-join-two-database-ta.html
The problem is that join
is an SQL concept, and Splunk is not a relational database. The command exists (and works), but its very often not the best approach
I believe it's slow because of the algorithm and virtual memory the join command uses (it basically has to build a Cartesian product of two datasets and then work from there). With amount of processing and memory consumption often causes the join subsearches to timeout as well. If you've not read it alreadym, here is an excellent Splunk documentation on when to use join and when to use it's alternatives.
https://docs.splunk.com/Documentation/Splunk/7.2.4/Search/Abouteventcorrelation