Knowledge Management

join on billions of records

splunkannm
New Member

My dataset that I need to join on is about half a billion. Since a subsearch is slow what is the alternative to using the join?

Tags (1)
0 Karma

woodcock
Esteemed Legend

Try this:

(index="abc" AND (sourcetype="a" OR sourcetype="b")  AND [|makeresults | eval search="_time>=" . relative_time(now(), "-36mon") | rex field=search "s/\"//g" | table search ])
OR (index="abc" AND sourcetype="c"))

| fields ACCT_NUM1 ACCT_NUM2 *
| eval right_acct_num=case(
   sourcetype=="a", ACCT_NUM1,
   sourcetype=="b", ACCT_NUM2,
   sourcetype=="c", acct_num,
   true(), "ERROR")
| stats values(*) as * BY right_acct_num
| rename COMMENT AS "Double-check me here; 'b' is from the first right join and the second left join."
| where sourcetype="b"
0 Karma

somesoni2
Revered Legend

Give this a try

((index="abc") (sourcetype="a" OR sourcetype="b" OR sourcetype="c" ))  earliest=-36months 
| fields - _raw date* punct time* 
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=case(sourcetype=="a", ACCT_NUM1,sourcetype=="b",ACCT_NUM2,true(),acct_num) 
| stats values(sourcetype) as sourcetypes latest(*) as * by right_acct_num     
| where NOT (mvcount(sourcetypes)=1 AND sourcetypes='c")
0 Karma

woodcock
Esteemed Legend

Show us your entire search and we can help. You have given us nothing to work with. See some of my posts about using appendpipe instead and maybe you can figure out something.

0 Karma

splunkannm
New Member

((index="abc") (sourcetype="a" OR sourcetype="b")) earliest=-36months
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=if(sourcetype=="a", ACCT_NUM1,ACCT_NUM2)
| stats latest() as * by right_acct_num

| join type="left" right_acct_num
[ search index = abc sourcetype=c
|rename acct_num as right_acct_num
| fields - _raw date
punct time*
| table * ]

The table that is being joined on has about 500 million records.

0 Karma

somesoni2
Revered Legend

That would depends upon the query that you're using currently. At high level, following gives the possible alternative commands:
http://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Join#Alternative_commands

splunkannm
New Member

Its a simple left join on one filed, just that the number of records are huge and the number of fields also are too many

0 Karma

somesoni2
Revered Legend

Would you mind sharing your current query (mask anything sensitive) and tell us what type of reporting you want to do after join those records? Sometimes two queries can be merged into one at base search level itself (the best option with least amount of limitations) or sometimes they need to be appended using subsearch (subsearch used for smallest dataset between two). So, it's case by case basis.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...