Basic join on two virtual indexes

ddrillic · ‎03-30-2016

We have a claims table in Hunk and a provider table, both came from an RDBMS to Hadoop via sqoop.
How can I join these two tables on the provider id, which is a foreign key on the claims table and the primary key on the provider table?

I see https://answers.splunk.com/answers/122594/hunk-join-2-virtual-indexes.html, but I'm not clear about it...

ddrillic · ‎03-31-2016

I'm trying the join command first ; - )

1) index=claim 362657618
---- It immediately brings prov_tin as a numerical field

2) index=provider 362657618

---- It immediately brings tin as a numerical field

3) index=claim 362657618 | join prov_tin [search provider | rename tin AS prov_tin]
---- It spins and spins

Is the syntax of #3 right?

martin_mueller · ‎03-31-2016

In #3 you've omitted the long number from the joined search, and the index= has gone missing. In the spirit of today's excellent virtual.conf talk by @sideview (recording here: http://wiki.splunk.com/Virtual_.conf ), try this:

(index=claim OR index=provider) 362657618 | eval common_tin = if(index="claim", prov_tin, tin) | stats values(*) as * by common_tin

sideview · ‎03-31-2016

values(*) as *

is awesome, provided that you put an explicit fields command in front of it... If (when) you ever have 100 or more fields incoming, search speed and memory usage can blow up unexpectedly.

martin_mueller · ‎03-31-2016

It's the best I can do with the information given in the question 😛

ddrillic · ‎03-31-2016

Thank you Martin!!

The query runs for 20 minutes or so but nothing comes back. Let's keep in mind that the claim index holds around 2 billion claims and each of them has hundreds of fields.

I do see the MapR job being generated and it's visible via the resource manager.

In order to run on it on a subset of the data I changed the first part to be -
((index=claim source="part-m-00078") OR index=provider) 362657618
The intent is to run the query only on one sqoop file. Does it look right?

I do get results now but I'm not sure they are right...

rdagan_splunk · ‎03-30-2016

The other angle to solve this is by accessing the database directly using Hunk with the DBConnect App - Lookup command:
http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaselookups
and
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/SQLtoSplunk

ddrillic · ‎03-31-2016

Interesting thing but, at this point, we probably won't choose to make a round-trip back to the DB.

martin_mueller · ‎03-30-2016

This isn't Hunk-specific, but should apply regardless: https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

ddrillic · ‎03-31-2016

Much appreciated - let me please check it...

Basic join on two virtual indexes

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?