How to reduce number of joins?

uagraw01 · ‎03-13-2023

Hello Splunkers!!

As per the below search you can see we have used join commands to get the results from same index & sourcetype. Due to multiple join commands the query become slow. Please help me how can I use single join command to get the result from all the fields "| fields - Total_Orders, Errors, Technical_Error, Operational_Error"

<search>

| join max=0 _time
[| search ((index=* OR index=_*) index=abc sourcetype=abc)
| fields + _time, host, source, sourcetype, Active, ErrorCode, ErrorDescription, ErrorDuration, ErrorId, From, Id, Location, ModuleId, OperationalWeighingFactor, ShuttleId, TechnicalWeighingFactor, TraceFlags, TraceId, TraceVersion, Version, date_hour, date_mday, date_minute, date_month, date_second, date_wday, date_year, index, Recoverable
| eval weeknum=strftime('_time',"%V")
| eval date_year=strftime('_time',"%Y"), date_month=strftime('_time',"%B"), day_week=strftime('_time',"%A"), date_mday=strftime('_time',"%d"), date_hour=strftime('_time',"%H"), date_minute=strftime('_time',"%M")
| search (date_year="*" date_month="*" weeknum="*" day_week="*" date_hour="*" date_minute="*" ShuttleId=*)
| fields + Id, _time, ErrorId, ErrorDescription
| table ErrorId, _time
| timechart span="1d@d1" count(ErrorId) as "Errors"]
| sort 0 _time
| fillnull Total_Orders Errors value="0"
| eval Total_Error_Per_10000_Order=round(((Errors / Total_Orders) * 10000),0)
| join max=0 _time
[| search ((index=* OR index=_*) index=abc sourcetype=abc)
| fields + _time, host, source, sourcetype, Active, ErrorCode, ErrorDescription, ErrorDuration, ErrorId, From, Id, Location, ModuleId, OperationalWeighingFactor, ShuttleId, TechnicalWeighingFactor, TraceFlags, TraceId, TraceVersion, Version, date_hour, date_mday, date_minute, date_month, date_second, date_wday, date_year, index, Recoverable
| eval weeknum=strftime('_time',"%V")
| eval date_year=strftime('_time',"%Y"), date_month=strftime('_time',"%B"), day_week=strftime('_time',"%A"), date_mday=strftime('_time',"%d"), date_hour=strftime('_time',"%H"), date_minute=strftime('_time',"%M")
| search (date_year="*" date_month="*" weeknum="*" day_week="*" date_hour="*" date_minute="*" ShuttleId=*)
| fields + Id, _time, ErrorId, ErrorDescription, TechnicalWeighingFactor
| rename TechnicalWeighingFactor as Technical_Error
| table _time, ErrorId, Technical_Error
| search Technical_Error>0.01
| timechart span="1d@d1" count(Technical_Error) as "Technical_Error"
| fillnull Technical_Error value="0"]
| fillnull Total_Orders Technical_Error value="0"
| eval Technical_Error_Per_10000_Order=round(((Technical_Error / Total_Orders) * 10000),0)
| join max=0 _time
[| search ((index=* OR index=_*) index=abc sourcetype=abc)
| fields + _time, host, source, sourcetype, Active, ErrorCode, ErrorDescription, ErrorDuration, ErrorId, From, Id, Location, ModuleId, OperationalWeighingFactor, ShuttleId, TechnicalWeighingFactor, TraceFlags, TraceId, TraceVersion, Version, date_hour, date_mday, date_minute, date_month, date_second, date_wday, date_year, index, Recoverable
| eval weeknum=strftime('_time',"%V")
| eval date_year=strftime('_time',"%Y"), date_month=strftime('_time',"%B"), day_week=strftime('_time',"%A"), date_mday=strftime('_time',"%d"), date_hour=strftime('_time',"%H"), date_minute=strftime('_time',"%M")
| search (date_year="*" date_month="*" weeknum="*" day_week="*" date_hour="*" date_minute="*" ShuttleId=*)
| fields + Id, _time, ErrorId, ErrorDescription, OperationalWeighingFactor
| rename OperationalWeighingFactor as Operational_Error
| table _time, ErrorId, Operational_Error
| search Operational_Error>0.01
| timechart span="1d@d1" count(Operational_Error) as "Operational_Error"
| fillnull Operational_Error value="0"]
| fillnull Total_Orders Operational_Error Technical_Error value="0"
| eval Operational_Error_Per_10000_Order=round(((Operational_Error / Total_Orders) * 10000),0)
| fields - Total_Orders, Errors, Technical_Error, Operational_Error

yuanliu · ‎03-14-2023

As @gcusello points out, these subsearches are extremely confusing, especially when used in combination with join on _time. Even without consideration of the cost of join, most operations inside the subsearches are pointless in regard to the final timechart command. Then, the final fields command removed all the additional fields these subsearches expensively build up that come in from the join commands.

Semantically, all these (outer) join commands combine to achieve only this: remove any event from the main search that

do not have _time value that is at time of day 00:00:00, and
fall on days in which there is no event in index=abc sourcetype=abc satisfying conditions ShuttleId=* AND TechnicalWeighingFactor >0.01, and
fall on days in which there is no event in index=abc sourcetype=abc satisfying conditions ShuttleId=* AND OperationalWeighingFactor >0.01.

So, I doubt if there is a need for any join. Of source, this really depends on whether the main search is also on index=abc sourcetype=abc. If not, I can suggest the following:

| join _time 
    [index=abc sourcetype=abc ShuttleId=* OperationalWeighingFactor>0.01 TechnicalWeighingFactor>0.01
    | timechart span="1d@d1" count
    | fields - count]

If the main search is also on index=abc sourcetype=abc, absolutely get rid of join and restructure the entire search.

PickleRick · ‎03-14-2023

OK, a side question - is this a manually crafted search or has it been generated by some tool?

uagraw01 · ‎03-17-2023

@PickleRick Its manually crafted

gcusello · ‎03-14-2023

Hi @uagraw01,

it's really hard to help you because I haven't your data to test the search, I can only give you some general hint:

at first don't use "((index=* OR index=_*) index=abc sourcetype=abc)" because if you have sourcetype and index you don't need to addthe first part of the search.

then don't use _time as join key because also a millisecond could block the join.

then I don't understand what you want to have as result because you have always the same subsearch and the same time period, so what's the scope of your search?

if you want to correlate data from different searches you have to put in the main search all the searches you have and then to find a correlation field to use in a stats command to group results; then in the stats command you add all the fields you need using the options of the stats command (values, earliest, last, etc...)

Ciao.

Giuseppe

How to reduce number of joins?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Data Management Digest – November 2025

Are you a member of the Splunk Community?

How to reduce number of joins?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Data Management Digest – November 2025