All Apps and Add-ons
Highlighted

Which is the best approach to join two database tables in SPLUNK?

Path Finder

Hi,
I have 6 database tables and I have to create dashboards depending on these tables. I have to use joins to get the data in the desired way. So I wanted to know what should I use: the SQL joins or SPL joins?

Few are some of the join conditions
Join of Table A and B
Join of Table A and C
Join of Tables A,B and C

I was thinking of retrieving the data of each table using dbxquery in the base search and then joining the tables using SPL joins. I don't know which one is better. Any help would be appreciated.

Thanks

0 Karma
Highlighted

Re: Which is the best approach to join two database tables in SPLUNK?

SplunkTrust
SplunkTrust

So, first, start by going and reading this post - https://answers.splunk.com/answers/561130/how-to-join-two-tables-where-the-key-is-named-diff.html


Second, no, you don't have to use "joins". Or, more specifically, you do not have to use the version of an SPL join-type-verb that happens to use the join keyword.

There are at least five different ways to approach joins in SPL, and the one that happens to use the join keyword is seldom the best choice of method.

While join is the basic... really the only SQL term for connecting data in a relational database, in SPL there are various interesting ways of using subsearches, and also in descending order of preference stats (including eventstats and streamstats), lookup, join, transaction, map, and a couple of other obscure methods.


Given the above information, how to deal with any particular combination of query requirements is this:

First, any data that can be connected together as part of the SQL query, if it CAN be, SHOULD be. Why bring disconnected data all back to Splunk for analysis and processing when it is already in a relational database with explicitly specified relationships? Especially, if an SQL join is not an equijoin - for instance, if you are using a date//time on a sale in one table to determine the price for that item based on the correct date/time range for that item on price records in another table - then get that complex logic done by the DBMS on the relational side, where the code is a bit more obvious. Splunk can accomplish that, but the code is not nearly as self-evident as the SQL would be.

Second, there are concrete (and annoying) limits on subqueries and joins in SPL, so avoid those when you can. That means the default method is the method referred to as "Splunk soup" - pretty much the search pseudocode in that linked article. Put everything together in a pot and stir until it becomes what you want.

Third, within those, there are a wide range of options, and which one is most efficient or practical is going to depend on the characteristics of your data. For instance if your two tables were in different databases, and Table B was small and sparse, then it might be most efficient to extract Table B into a lookup table and then use the lookup verb.