Splunk Search

Splunk Join Optimisation

deepuhassan
Explorer

Hi,

I have the below query which does the search on two different sources in the same index and join the results based app correlation id to get results and perform the stats operation. However, the source files are huge and hence the join is taking too longs to get me the results.

index=server sourcetype=perfromance source="*performance.log"  component_role=consumer 

| join  app_id [ search index=server sourcetype=component source="*component.log" | rename appCorId as app_id ]

| stats count(eval=(process_result="COMPLETED")) as Completed count(eval=(process_result="FAILED")) as Failed

This is a simple join but taking huge time when do a search for 24 hours.

Please help optimize this query.

Thanks,

Sandeep

 

 

Labels (1)
0 Karma
1 Solution

to4kawa
Ultra Champion

 

index=server (sourcetype=perfromance source="*performance.log" component_role=consumer ) OR (sourcetype=component source="*component.log")
| eval app_id=coalesce(appCorId,app_id)
| stats count(eval=(process_result="COMPLETED")) as Completed count(eval(process_result="FAILED")) as Failed dc(source) as flag by app_id
| where flag > 1
| stats sum(Completed) as Completed sum(Failed) as Failed

 

View solution in original post

0 Karma

to4kawa
Ultra Champion

 

index=server (sourcetype=perfromance source="*performance.log" component_role=consumer ) OR (sourcetype=component source="*component.log")
| eval app_id=coalesce(appCorId,app_id)
| stats count(eval=(process_result="COMPLETED")) as Completed count(eval(process_result="FAILED")) as Failed dc(source) as flag by app_id
| where flag > 1
| stats sum(Completed) as Completed sum(Failed) as Failed

 

0 Karma

deepuhassan
Explorer

Thanks.. It is partially working.

However, would you please explain how the steps you have taken to form the query?

What shall I do if the field name used for correlation is same in both the sources? 

0 Karma

to4kawa
Ultra Champion

However, would you please explain how the steps you have taken to form the query?

->https://conf.splunk.com/files/2019/slides/FNC2751.pdf

  Please see here.

 

What shall I do if the field name used for correlation is same in both the sources?

-> remove eval with coalesce

deepuhassan
Explorer

Thanks for the document.

I am not able to optimize the below query :

index=server sourcetype="performance"  source="*/performance.log" perf_component_role=kafkaProducer component_name="test-api-v1" event_status=COMPLETE  | eval publishedtime=_time
| join app_correlation_id [search index=server component_name="test-api-v2" environment=sit1 "Process updated Successfully !!" | eval processedtime=_time]
| eval endToEndTime = (processedtime -publishedtime) | table app_correlation_id publishedtime processedtime endToEndTime

0 Karma

to4kawa
Ultra Champion

stats range() by app_correlation_id  is useful. you can do it.

deepuhassan
Explorer

I have tried using the technique you have given me for the below query but it is not working. The below query will be used to calculate the time difference of an event logged in 2 different sources which has common correlation id. Since I am using join it is taking longer time.

Basically I am trying to find out the end to end time involved for a message which started from Kafka producer and processed by a consumer.

index=server sourcetype="performance"  source="*/performance.log" perf_component_role=kafkaProducer component_name="test-api-v1" event_status=COMPLETE  | eval publishedtime=_time
| join app_correlation_id [search index=server component_name="test-api-v2" environment=sit1 "Process updated Successfully !!" | eval processedtime=_time]
| eval endToEndTime = (processedtime -publishedtime) | table app_correlation_id publishedtime processedtime endToEndTime

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Data Management Digest – May 2026

Welcome to the May 2026 edition of Data Management Digest!   As your trusted partner in data innovation, the ...

Index This | What is feather-light but cannot be held long?

May 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

.conf26 Registration is Live: Secure Your Early Bird Pass Now

  Lock in Your Spot: Registration Open for .conf26 in Denver Hello Splunkers, I have exciting news! Your ...