Splunk Search

Optimizing Splunk Search Query without Using Joins and Subquery due to Record Limitation

Arpit-Dwivedi
New Member

 

Hello Community,


I'm seeking some guidance with optimizing a Splunk search query that involves multiple table searches and joins. The primary issue I'm encountering is the limitation imposed by subqueries, restricting the total records to 50,000.
Here's the current query structure I'm working with:


index="sample" "message.process"="*app-name1" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as sapi-outbound-call
| stats count by sapi-outbound-call UUID
| join type=inner UUID
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| rename "message.flowName" as exp-inbound-call]
| stats count by exp-inbound-call sapi-outbound-call
| join left=L right=R where L.exp-inbound-call = R.exp-inbound-call
[search index="sample" "message.process"="*app-name2" "message.flowName"="*| *"
| rename "message.flowName" as exp-inbound-call
| stats count by exp-inbound-call]
| stats list(*) AS * by R.exp-inbound-call R.count
| table R.exp-inbound-call R.count L.sapi-outbound-call L.count

The intention behind this query is to generate statistics based on two query searches or tables while filtering out data based on a common UUID. However, the usage of multiple joins within subqueries is causing limitations due to the 50,000 record cap.


I'm looking for alternative approaches or optimizations to achieve the same result without relying heavily on joins within subqueries. Any insights, suggestions, or examples would be incredibly valuable.
Thank you in advance for your help and expertise!


Regards

Labels (5)
0 Karma

tscroggins
Influencer

 

 

Hi @Arpit-Dwivedi,

A description of your call graph may better illustrate the problem, but given your SPL, I would expect the following output, where [] represents a list or array:

 

R.exp-inbound-call R.count L.sapi-outbound-call[] L.count[]

 

You can often reduce joins to one or more stats commands. This may produce the expected output:

 

index=sample message.process IN (*app-name1 *app-name2) message.flowName="*| *"
| rex field=message.correlationId "(?<UUID>^[0-9a-z-]{0,36})"
| eval call-type=case(like('message.process', "%app-name1"), "sapi-outbound-call", like('message.process', "%app-name2"), "exp-inbound-call")
| eval {call-type}='message.flowName'
| stats list(exp-inbound-call) as exp-inbound-call list(sapi-outbound-call) as sapi-outbound-call by UUID
| stats count by exp-inbound-call sapi-outbound-call
| stats max(count) as exp-inbound-call-count list(sapi-outbound-call) as sapi-outbound-call list(count) as sapi-outbound-call-count by exp-inbound-call

 

 

0 Karma
Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...