Hi, I have a performance issue with a query using a "join" command.
The problem is that the first search using a time picker on last 4 hours, and the search in the join (type outer) hook using "earliest=-30d"
Example of the query :
index="A" sourcetype="AB" source="C"
| eval launch_time=round(strptime(launch_time, "%Y-%m-%dT%H:%M:%S"),0)
| eval search_time=now()
| eval launched_since=round((search_time-launch_time)/86400,0)
| where launched_since > 7
| dedup id sortby -_time
| lookup all_ids account_id OUTPUT acc_name site | site=*
| join type=outer id [ search index="A" sourcetype="AC" source="D" earliest=-30d
| lookup all_ids account_id OUTPUT acc_name site | site=*
| rename agentId as id
| dedup rpg id
| sort rpg
| stats values(rpg_name) as pg by id acc_name site
| eval Name=if(like(pg,"%name1/%"),"Name1","Name2")
| table id title platform pg Name]
| table site acc_name id pg Name launched_since
| dedup acc_name id
| eval Name2=if(isnull(Name), "NULL", Name)
| stats count(id) as count by Name2
Is it possible to make a search more efficient ?
Thanks in advance !
You need to be careful using join, especially when in your example you are searching over 30 days. There are subsearch limits that apply to join, particularly in the time for the subsearch to run and the size of the subsearch result set - I think it's 60 seconds and 50,000 items by default.
Generally there are always ways to avoid joins, but given your example, I have the following observations.
However, I have almost always found the construct of
(search data set 1 date_range_1) OR (search data set 2 date_range_2)
| eval ds1=if(!isnull(field_in_ds1),1,0)
and then using eval/if and stats to aggregate the two data sets accordingly to be more efficient than most joins when working with large data sets.
One big problem with joins is that you will almost never know that your join has hit a limit and that you have not got the right set of results in the subsearch, which will then cause the outer search to be incorrect (but that's when we call them 'statistics' right?? :))
Hope this helps
Thanks for your answer.
I applied your recommandations but the join is still really inefficient.
I did not mention it in my initial demand, but I have rex command in the sub search : is this can slow down the global search ?
1. Use fields to get the required fields only after the base search in both the searches. this will improve performance
index="A" sourcetype="AB" source="C"
| fields X Y Z
2. You can optimize the sub search specifically these three
| dedup rpg id
| sort rpg
| stats values(rpg_name) as pg by id acc_name site