Splunk Search

Basic join on two virtual indexes

ddrillic
Ultra Champion

We have a claims table in Hunk and a provider table, both came from an RDBMS to Hadoop via sqoop.
How can I join these two tables on the provider id, which is a foreign key on the claims table and the primary key on the provider table?

I see https://answers.splunk.com/answers/122594/hunk-join-2-virtual-indexes.html, but I'm not clear about it...

Tags (2)
0 Karma

ddrillic
Ultra Champion

I'm trying the join command first ; - )

1) index=claim 362657618
---- It immediately brings prov_tin as a numerical field

2) index=provider 362657618

---- It immediately brings tin as a numerical field

3) index=claim 362657618 | join prov_tin [search provider | rename tin AS prov_tin]
---- It spins and spins

Is the syntax of #3 right?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

In #3 you've omitted the long number from the joined search, and the index= has gone missing. In the spirit of today's excellent virtual.conf talk by @sideview (recording here: http://wiki.splunk.com/Virtual_.conf ), try this:

(index=claim OR index=provider) 362657618 | eval common_tin = if(index="claim", prov_tin, tin) | stats values(*) as * by common_tin

sideview
SplunkTrust
SplunkTrust
values(*) as *

is awesome, provided that you put an explicit fields command in front of it... If (when) you ever have 100 or more fields incoming, search speed and memory usage can blow up unexpectedly.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

It's the best I can do with the information given in the question 😛

0 Karma

ddrillic
Ultra Champion

Thank you Martin!!

The query runs for 20 minutes or so but nothing comes back. Let's keep in mind that the claim index holds around 2 billion claims and each of them has hundreds of fields.

I do see the MapR job being generated and it's visible via the resource manager.

In order to run on it on a subset of the data I changed the first part to be -
((index=claim source="part-m-00078") OR index=provider) 362657618
The intent is to run the query only on one sqoop file. Does it look right?

I do get results now but I'm not sure they are right...

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

The other angle to solve this is by accessing the database directly using Hunk with the DBConnect App - Lookup command:
http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaselookups
and
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/SQLtoSplunk

0 Karma

ddrillic
Ultra Champion

Interesting thing but, at this point, we probably won't choose to make a round-trip back to the DB.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust
0 Karma

ddrillic
Ultra Champion

Much appreciated - let me please check it...

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...