Splunk Search

Optimizing Splunk Search Query without Using Joins and Subquery due to Record Limitation

Arpit-Dwivedi
New Member

 

Hello Community,


I'm seeking some guidance with optimizing a Splunk search query that involves multiple table searches and joins. The primary issue I'm encountering is the limitation imposed by subqueries, restricting the total records to 50,000.
Here's the current query structure I'm working with:


index="sample" "message.process"="*app-name1" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as sapi-outbound-call
| stats count by sapi-outbound-call UUID
| join type=inner UUID
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as exp-inbound-call]
| stats count by exp-inbound-call sapi-outbound-call
| join left=L right=R where L.exp-inbound-call = R.exp-inbound-call
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rename "message.flowName" as exp-inbound-call
| stats count by exp-inbound-call]
| stats list(*) AS * by R.exp-inbound-call R.count
| table R.exp-inbound-call R.count L.sapi-outbound-call L.count

The intention behind this query is to generate statistics based on two query searches or tables while filtering out data based on a common UUID. However, the usage of multiple joins within subqueries is causing limitations due to the 50,000 record cap.


I'm looking for alternative approaches or optimizations to achieve the same result without relying heavily on joins within subqueries. Any insights, suggestions, or examples would be incredibly valuable.
Thank you in advance for your help and expertise!


Regards

Labels (5)
0 Karma

tscroggins
Influencer

 

 

Hi @Arpit-Dwivedi,

A description of your call graph may better illustrate the problem, but given your SPL, I would expect the following output, where [] represents a list or array:

 

R.exp-inbound-call R.count L.sapi-outbound-call[] L.count[]

 

You can often reduce joins to one or more stats commands. This may produce the expected output:

 

index=sample message.process IN (*app-name1 *app-name2) message.flowName="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| eval call-type=case(like('message.process', "%app-name1"), "sapi-outbound-call", like('message.process', "%app-name2"), "exp-inbound-call")
| eval {call-type}='message.flowName'
| stats list(exp-inbound-call) as exp-inbound-call list(sapi-outbound-call) as sapi-outbound-call by UUID
| stats count by exp-inbound-call sapi-outbound-call
| stats max(count) as exp-inbound-call-count list(sapi-outbound-call) as sapi-outbound-call list(count) as sapi-outbound-call-count by exp-inbound-call

 

 

0 Karma
Get Updates on the Splunk Community!

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...