Splunk Search

Basic join on two virtual indexes

ddrillic
Ultra Champion

We have a claims table in Hunk and a provider table, both came from an RDBMS to Hadoop via sqoop.
How can I join these two tables on the provider id, which is a foreign key on the claims table and the primary key on the provider table?

I see https://answers.splunk.com/answers/122594/hunk-join-2-virtual-indexes.html, but I'm not clear about it...

Tags (2)
0 Karma

ddrillic
Ultra Champion

I'm trying the join command first ; - )

1) index=claim 362657618
---- It immediately brings prov_tin as a numerical field

2) index=provider 362657618

---- It immediately brings tin as a numerical field

3) index=claim 362657618 | join prov_tin [search provider | rename tin AS prov_tin]
---- It spins and spins

Is the syntax of #3 right?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

In #3 you've omitted the long number from the joined search, and the index= has gone missing. In the spirit of today's excellent virtual.conf talk by @sideview (recording here: http://wiki.splunk.com/Virtual_.conf ), try this:

(index=claim OR index=provider) 362657618 | eval common_tin = if(index="claim", prov_tin, tin) | stats values(*) as * by common_tin

sideview
SplunkTrust
SplunkTrust
values(*) as *

is awesome, provided that you put an explicit fields command in front of it... If (when) you ever have 100 or more fields incoming, search speed and memory usage can blow up unexpectedly.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

It's the best I can do with the information given in the question 😛

0 Karma

ddrillic
Ultra Champion

Thank you Martin!!

The query runs for 20 minutes or so but nothing comes back. Let's keep in mind that the claim index holds around 2 billion claims and each of them has hundreds of fields.

I do see the MapR job being generated and it's visible via the resource manager.

In order to run on it on a subset of the data I changed the first part to be -
((index=claim source="part-m-00078") OR index=provider) 362657618
The intent is to run the query only on one sqoop file. Does it look right?

I do get results now but I'm not sure they are right...

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

The other angle to solve this is by accessing the database directly using Hunk with the DBConnect App - Lookup command:
http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaselookups
and
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/SQLtoSplunk

0 Karma

ddrillic
Ultra Champion

Interesting thing but, at this point, we probably won't choose to make a round-trip back to the DB.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust
0 Karma

ddrillic
Ultra Champion

Much appreciated - let me please check it...

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...