Splunk Search

Optimizing Splunk Search Query without Using Joins and Subquery due to Record Limitation

Arpit-Dwivedi
New Member

 

Hello Community,


I'm seeking some guidance with optimizing a Splunk search query that involves multiple table searches and joins. The primary issue I'm encountering is the limitation imposed by subqueries, restricting the total records to 50,000.
Here's the current query structure I'm working with:


index="sample" "message.process"="*app-name1" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as sapi-outbound-call
| stats count by sapi-outbound-call UUID
| join type=inner UUID
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as exp-inbound-call]
| stats count by exp-inbound-call sapi-outbound-call
| join left=L right=R where L.exp-inbound-call = R.exp-inbound-call
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rename "message.flowName" as exp-inbound-call
| stats count by exp-inbound-call]
| stats list(*) AS * by R.exp-inbound-call R.count
| table R.exp-inbound-call R.count L.sapi-outbound-call L.count

The intention behind this query is to generate statistics based on two query searches or tables while filtering out data based on a common UUID. However, the usage of multiple joins within subqueries is causing limitations due to the 50,000 record cap.


I'm looking for alternative approaches or optimizations to achieve the same result without relying heavily on joins within subqueries. Any insights, suggestions, or examples would be incredibly valuable.
Thank you in advance for your help and expertise!


Regards

Labels (5)
0 Karma

tscroggins
Champion

 

 

Hi @Arpit-Dwivedi,

A description of your call graph may better illustrate the problem, but given your SPL, I would expect the following output, where [] represents a list or array:

 

R.exp-inbound-call R.count L.sapi-outbound-call[] L.count[]

 

You can often reduce joins to one or more stats commands. This may produce the expected output:

 

index=sample message.process IN (*app-name1 *app-name2) message.flowName="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| eval call-type=case(like('message.process', "%app-name1"), "sapi-outbound-call", like('message.process', "%app-name2"), "exp-inbound-call")
| eval {call-type}='message.flowName'
| stats list(exp-inbound-call) as exp-inbound-call list(sapi-outbound-call) as sapi-outbound-call by UUID
| stats count by exp-inbound-call sapi-outbound-call
| stats max(count) as exp-inbound-call-count list(sapi-outbound-call) as sapi-outbound-call list(count) as sapi-outbound-call-count by exp-inbound-call

 

 

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...

Keep the Learning Going with the New Best of .conf Hub

Hello Splunkers, With .conf26 getting closer, there’s already a lot of excitement building around this year’s ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...