Optimizing Splunk Search Query without Using Joins...

Arpit-Dwivedi · ‎11-16-2023

Hello Community,

I'm seeking some guidance with optimizing a Splunk search query that involves multiple table searches and joins. The primary issue I'm encountering is the limitation imposed by subqueries, restricting the total records to 50,000.
Here's the current query structure I'm working with:


index="sample" "message.process"="*app-name1" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as sapi-outbound-call
| stats count by sapi-outbound-call UUID
| join type=inner UUID
 [search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as exp-inbound-call]
| stats count by exp-inbound-call sapi-outbound-call
| join left=L right=R where L.exp-inbound-call = R.exp-inbound-call
 [search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rename "message.flowName" as exp-inbound-call
| stats count by exp-inbound-call]
| stats list(*) AS * by R.exp-inbound-call R.count
| table R.exp-inbound-call R.count L.sapi-outbound-call L.count

The intention behind this query is to generate statistics based on two query searches or tables while filtering out data based on a common UUID. However, the usage of multiple joins within subqueries is causing limitations due to the 50,000 record cap.

I'm looking for alternative approaches or optimizations to achieve the same result without relying heavily on joins within subqueries. Any insights, suggestions, or examples would be incredibly valuable.
Thank you in advance for your help and expertise!

Regards

tscroggins · ‎11-25-2023

Hi @Arpit-Dwivedi,

A description of your call graph may better illustrate the problem, but given your SPL, I would expect the following output, where [] represents a list or array:

R.exp-inbound-call R.count L.sapi-outbound-call[] L.count[]

You can often reduce joins to one or more stats commands. This may produce the expected output:

index=sample message.process IN (*app-name1 *app-name2) message.flowName="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| eval call-type=case(like('message.process', "%app-name1"), "sapi-outbound-call", like('message.process', "%app-name2"), "exp-inbound-call")
| eval {call-type}='message.flowName'
| stats list(exp-inbound-call) as exp-inbound-call list(sapi-outbound-call) as sapi-outbound-call by UUID
| stats count by exp-inbound-call sapi-outbound-call
| stats max(count) as exp-inbound-call-count list(sapi-outbound-call) as sapi-outbound-call list(count) as sapi-outbound-call-count by exp-inbound-call

Optimizing Splunk Search Query without Using Joins and Subquery due to Record Limitation

count

join

regex

stats

subsearch

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Optimizing Splunk Search Query without Using Joins and Subquery due to Record Limitation