I am dealing with a massive dataset, i need to perform a join operation on a commonid. I have tried multiple variations of the query but i am facing issues with each of them. It would be great if anyone can provide a solution(Modifying the configuration files is not an option).
Query Version 1: This query is taking up too much of memory and the query is auto-finalizing after reaching 1GB data limit, consequently the results are inaccurate.
index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| search XYZ=DEF
| timechart span=1d count by PQR
Query Version 2: This query appears to be more efficient, but I am getting "[subsearch]: Search auto-finalized after time limit (60 seconds) reached" and consequently i am not able to obtain the accurate results.
index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
[search index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
| search XYZ=DEF | fields COMMONID
| dedup COMMONID]
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| timechart span=1d count by PQR
Query Version 3: This query will probably give accurate results, but it is very inefficient will take about 20 hours to finish execution.
index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
| search (PQR=* OR NOT PQR=*)
| join COMMONID
[search index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" (api_name=API1 OR api_name=API2 )
| search XYZ=DEF ]
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| timechart span=1d count by PQR
I don't get | search (PQR=* OR NOT PQR=*)
; it does absolutely nothing so why is it there?
I want all the values of PQR to be considered (including NULL values), you're probably right though, i guess it should work even without "| search (PQR=* OR NOT PQR=*)"
If that part disappears then the search simplifies.
1) In query 2 replace | fields COMMONID | dedup COMMONID
with |stats by COMMONID
for a slight boost.
2) You could break query 2 down into two pieces and have piece 1 output a lookup to be used by piece 2. That way, the combined searches have a much higher total limit.
First half...
index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice"
(api_name=API1 OR api_name=API2 )
| search XYZ=DEF
| stats by COMMONID
| outputlookup myCommonIds.csv
Second half...
index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice"
(api_name=API1 OR api_name=API2 )
[| inputlookup append=t myCommonIds.csv]
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| timechart span=30m count by PQR
3) Since you're not doing anything to the records before the stats command, you could break the search down even further with multisearch
...
| multisearch
[ search index=INDEXA earliest=-7d@d latest=-5d@d sourcetype=GHI "service=randomservice"
(api_name=API1 OR api_name=API2 )
[| inputlookup append=t myCommonIds.csv]
]
[ search index=INDEXA earliest=-5d@d latest=-3d@d sourcetype=GHI "service=randomservice"
(api_name=API1 OR api_name=API2 )
[| inputlookup append=t myCommonIds.csv]
]
[ search index=INDEXA earliest=-3d@d latest=@d sourcetype=GHI "service=randomservice"
(api_name=API1 OR api_name=API2 )
[| inputlookup append=t myCommonIds.csv]
]
| stats first(_time) as _time, values(XYZ) AS XYZ, values(PQR) AS PQR by COMMONID
| fillnull PQR value="NULL"
| timechart span=30m count by PQR
Do you have access to doing summary searches? If so, it appears that you could use that method to break up your search into smaller, already summarized chunks of data. That way your end result would take FAR less time to produce at search time.
just a quick idea, i'm not sure if they're applicable or not for your data.
Can you do: index=INDEXA earliest=-7d@d latest=@d sourcetype=GHI "service=randomservice" XYZ=DEF (api_name=API1 OR api_name=API2 )| stats first(_time) as _time, values(PQR) AS PQR by COMMONID | fillnull PQR value="NULL" | timechart span=1d count by PQR