Knowledge Management

join on billions of records

splunkannm
New Member

My dataset that I need to join on is about half a billion. Since a subsearch is slow what is the alternative to using the join?

Tags (1)
0 Karma

woodcock
Esteemed Legend

Try this:

(index="abc" AND (sourcetype="a" OR sourcetype="b")  AND [|makeresults | eval search="_time>=" . relative_time(now(), "-36mon") | rex field=search "s/\"//g" | table search ])
OR (index="abc" AND sourcetype="c"))

| fields ACCT_NUM1 ACCT_NUM2 *
| eval right_acct_num=case(
   sourcetype=="a", ACCT_NUM1,
   sourcetype=="b", ACCT_NUM2,
   sourcetype=="c", acct_num,
   true(), "ERROR")
| stats values(*) as * BY right_acct_num
| rename COMMENT AS "Double-check me here; 'b' is from the first right join and the second left join."
| where sourcetype="b"
0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

((index="abc") (sourcetype="a" OR sourcetype="b" OR sourcetype="c" ))  earliest=-36months 
| fields - _raw date* punct time* 
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=case(sourcetype=="a", ACCT_NUM1,sourcetype=="b",ACCT_NUM2,true(),acct_num) 
| stats values(sourcetype) as sourcetypes latest(*) as * by right_acct_num     
| where NOT (mvcount(sourcetypes)=1 AND sourcetypes='c")
0 Karma

woodcock
Esteemed Legend

Show us your entire search and we can help. You have given us nothing to work with. See some of my posts about using appendpipe instead and maybe you can figure out something.

0 Karma

splunkannm
New Member

((index="abc") (sourcetype="a" OR sourcetype="b")) earliest=-36months
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=if(sourcetype=="a", ACCT_NUM1,ACCT_NUM2)
| stats latest() as * by right_acct_num

| join type="left" right_acct_num
[ search index = abc sourcetype=c
|rename acct_num as right_acct_num
| fields - _raw date
punct time*
| table * ]

The table that is being joined on has about 500 million records.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

That would depends upon the query that you're using currently. At high level, following gives the possible alternative commands:
http://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Join#Alternative_commands

splunkannm
New Member

Its a simple left join on one filed, just that the number of records are huge and the number of fields also are too many

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Would you mind sharing your current query (mask anything sensitive) and tell us what type of reporting you want to do after join those records? Sometimes two queries can be merged into one at base search level itself (the best option with least amount of limitations) or sometimes they need to be appended using subsearch (subsearch used for smallest dataset between two). So, it's case by case basis.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...