- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Basic join on two virtual indexes
We have a claims table in Hunk and a provider table, both came from an RDBMS to Hadoop via sqoop.
How can I join these two tables on the provider id, which is a foreign key on the claims table and the primary key on the provider table?
I see https://answers.splunk.com/answers/122594/hunk-join-2-virtual-indexes.html, but I'm not clear about it...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying the join command first ; - )
1) index=claim 362657618
---- It immediately brings prov_tin as a numerical field
2) index=provider 362657618
---- It immediately brings tin as a numerical field
3) index=claim 362657618 | join prov_tin [search provider | rename tin AS prov_tin]
---- It spins and spins
Is the syntax of #3 right?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In #3 you've omitted the long number from the joined search, and the index= has gone missing. In the spirit of today's excellent virtual.conf talk by @sideview (recording here: http://wiki.splunk.com/Virtual_.conf ), try this:
(index=claim OR index=provider) 362657618 | eval common_tin = if(index="claim", prov_tin, tin) | stats values(*) as * by common_tin
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

values(*) as *
is awesome, provided that you put an explicit fields command in front of it... If (when) you ever have 100 or more fields incoming, search speed and memory usage can blow up unexpectedly.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It's the best I can do with the information given in the question 😛
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Martin!!
The query runs for 20 minutes or so but nothing comes back. Let's keep in mind that the claim index holds around 2 billion claims and each of them has hundreds of fields.
I do see the MapR job being generated and it's visible via the resource manager.
In order to run on it on a subset of the data I changed the first part to be -
((index=claim source="part-m-00078") OR index=provider) 362657618
The intent is to run the query only on one sqoop file. Does it look right?
I do get results now but I'm not sure they are right...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The other angle to solve this is by accessing the database directly using Hunk with the DBConnect App - Lookup command:
http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaselookups
and
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/SQLtoSplunk
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting thing but, at this point, we probably won't choose to make a round-trip back to the DB.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This isn't Hunk-specific, but should apply regardless: https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Much appreciated - let me please check it...
