My dataset that I need to join on is about half a billion. Since a subsearch is slow what is the alternative to using the join?
Try this:
(index="abc" AND (sourcetype="a" OR sourcetype="b") AND [|makeresults | eval search="_time>=" . relative_time(now(), "-36mon") | rex field=search "s/\"//g" | table search ])
OR (index="abc" AND sourcetype="c"))
| fields ACCT_NUM1 ACCT_NUM2 *
| eval right_acct_num=case(
sourcetype=="a", ACCT_NUM1,
sourcetype=="b", ACCT_NUM2,
sourcetype=="c", acct_num,
true(), "ERROR")
| stats values(*) as * BY right_acct_num
| rename COMMENT AS "Double-check me here; 'b' is from the first right join and the second left join."
| where sourcetype="b"
Give this a try
((index="abc") (sourcetype="a" OR sourcetype="b" OR sourcetype="c" )) earliest=-36months
| fields - _raw date* punct time*
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=case(sourcetype=="a", ACCT_NUM1,sourcetype=="b",ACCT_NUM2,true(),acct_num)
| stats values(sourcetype) as sourcetypes latest(*) as * by right_acct_num
| where NOT (mvcount(sourcetypes)=1 AND sourcetypes='c")
Show us your entire search and we can help. You have given us nothing to work with. See some of my posts about using appendpipe
instead and maybe you can figure out something.
((index="abc") (sourcetype="a" OR sourcetype="b")) earliest=-36months
| fields ACCT_NUM1 ACCT_NUM2 *
| eval "right_acct_num"=if(sourcetype=="a", ACCT_NUM1,ACCT_NUM2)
| stats latest() as * by right_acct_num
| join type="left" right_acct_num
[ search index = abc sourcetype=c
|rename acct_num as right_acct_num
| fields - _raw date punct time*
| table * ]
The table that is being joined on has about 500 million records.
That would depends upon the query that you're using currently. At high level, following gives the possible alternative commands:
http://docs.splunk.com/Documentation/Splunk/7.1.0/SearchReference/Join#Alternative_commands
Its a simple left join on one filed, just that the number of records are huge and the number of fields also are too many
Would you mind sharing your current query (mask anything sensitive) and tell us what type of reporting you want to do after join those records? Sometimes two queries can be merged into one at base search level itself (the best option with least amount of limitations) or sometimes they need to be appended using subsearch (subsearch used for smallest dataset between two). So, it's case by case basis.