Splunk Enterprise

Slow join - Alternative to join for linking calls from upstream applications

vbarra
Engager

Hi All,

I know the topic is quite extensively documented in several posts within splunk community but I could not really figure out what is best to apply in the case below.

some context about architecture in use
We have basically 3 layers (so index) which a FE call goes through:
  • apigee 
  • mise (microservices)
  • eCommerce applicaiton

In other words, a call initiated by FE is routed to apigee from where it goes to some microservice which in turn might call the eCommerce application. The calls are uniquely identified by a requestId so given an apigee call I can get exactly which are the related calls to eCommerce because of the requestId.


Splunk dashboard
I'm building a dashboard where:
  • panel 1: I list all apigee calls grouped by common url to get some stats out out of it (so far so good). Something like:  
    > oce/soe/orders/v1/*/billingProfile (normalized url), then display the columns: count, avg(duration), perc90(duration)
    > oce/soe/orders/v1/*/delivery-method (normalized url), then display the columns: count, avg(duration), perc90(duration)
    > ...
  • panel 2: given the focus on an one apigee "normalized" call from panel one I list all related eCommerce calls. The goal is to grab some stats over such ep calls, grouping those by common urls and then average duration and taking perc90. To make this working I do use a join but it's very slow. Something like, given the focus on billingProfile apigee call above:

ecommerceapp/billinginfo/store/*/ (normalized url),  then display the columns: count, avg(duration), perc90(duration)

 


I want to ask if you see any other way to reach the same goal without a join or if you have any generally hint to improve the performance. 

Thanks in advance,
Vincenzo

 
I report below the index search I'm currently using highlighting the part of interest:

 

 

 

 

 

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" | where isnotnull( SCS_Request_ID) | rex field=_raw "(?<operation>(?<![\w\d])(GET|POST|PUT|DELETE)(?![\w\d]))" | rex field=_raw "(?<=[/^(POST|GET|PUT|DELETE)$/] )(?<service>[\/a-zA-Z\.].+?(?=HTTP))" | rex field=_raw "(?<=Duration=)(?<duration>[0-9]*)" | eval temp=split(service,"/")  | eval field1=mvindex(temp,1) | eval field2=mvindex(temp,2) | eval field3=mvindex(temp,3) | eval field4=mvindex(temp,4) | eval field4=if(like(field4,"%=%") OR like(field4,"%?%"), "*", field4) | eval field5=mvindex(temp,5)| eval field5=if(like(field5, "%=%") OR like(field5,"%?%"), "*", field5) | eval url_short=if(isnull(field5),field1."/".field2."/".field3."/".field4."/", field1."/".field2."/".field3."/".field4."/".field5)  | eval fullName = operation." ".url_short | table SCS_Request_ID, operation, url_short, duration | join SCS_Request_ID [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) ] | stats count, avg(duration) as avg_, perc90(duration) as perc90_ by operation, url_short | eval avg_=round(avg_,2) | eval perc90_=round(perc90_,2)<div> </div>

 

 

 

 

 

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...

View solution in original post

0 Karma

vbarra
Engager

thanks a lot @ITWhisperer , indeed it worked (sorry it was obvious but I was somehow thinking to join only)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...
0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...