Solved: Slow join - Alternative to join for linking calls ...

vbarra · ‎10-06-2021

Hi All,

I know the topic is quite extensively documented in several posts within splunk community but I could not really figure out what is best to apply in the case below.

some context about architecture in use

We have basically 3 layers (so index) which a FE call goes through:

apigee
mise (microservices)
eCommerce applicaiton

In other words, a call initiated by FE is routed to apigee from where it goes to some microservice which in turn might call the eCommerce application. The calls are uniquely identified by a requestId so given an apigee call I can get exactly which are the related calls to eCommerce because of the requestId.

Splunk dashboard

I'm building a dashboard where:

panel 1: I list all apigee calls grouped by common url to get some stats out out of it (so far so good). Something like:
> oce/soe/orders/v1/*/billingProfile (normalized url), then display the columns: count, avg(duration), perc90(duration)
> oce/soe/orders/v1/*/delivery-method (normalized url), then display the columns: count, avg(duration), perc90(duration)
> ...
panel 2: given the focus on an one apigee "normalized" call from panel one I list all related eCommerce calls. The goal is to grab some stats over such ep calls, grouping those by common urls and then average duration and taking perc90. To make this working I do use a join but it's very slow. Something like, given the focus on billingProfile apigee call above:

> ecommerceapp/billinginfo/store/*/ (normalized url), then display the columns: count, avg(duration), perc90(duration)

I want to ask if you see any other way to reach the same goal without a join or if you have any generally hint to improve the performance.

Thanks in advance,
Vincenzo

I report below the index search I'm currently using highlighting the part of interest:

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" | where isnotnull( SCS_Request_ID) | rex field=_raw "(?<operation>(?<![\w\d])(GET|POST|PUT|DELETE)(?![\w\d]))" | rex field=_raw "(?<=[/^(POST|GET|PUT|DELETE)$/] )(?<service>[\/a-zA-Z\.].+?(?=HTTP))" | rex field=_raw "(?<=Duration=)(?<duration>[0-9]*)" | eval temp=split(service,"/")  | eval field1=mvindex(temp,1) | eval field2=mvindex(temp,2) | eval field3=mvindex(temp,3) | eval field4=mvindex(temp,4) | eval field4=if(like(field4,"%=%") OR like(field4,"%?%"), "*", field4) | eval field5=mvindex(temp,5)| eval field5=if(like(field5, "%=%") OR like(field5,"%?%"), "*", field5) | eval url_short=if(isnull(field5),field1."/".field2."/".field3."/".field4."/", field1."/".field2."/".field3."/".field4."/".field5)  | eval fullName = operation." ".url_short | table SCS_Request_ID, operation, url_short, duration | join SCS_Request_ID [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) ] | stats count, avg(duration) as avg_, perc90(duration) as perc90_ by operation, url_short | eval avg_=round(avg_,2) | eval perc90_=round(perc90_,2)<div> </div>

ITWhisperer · ‎10-06-2021

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...

View solution in original post

vbarra · ‎10-06-2021

thanks a lot @ITWhisperer , indeed it worked (sorry it was obvious but I was somehow thinking to join only)

ITWhisperer · ‎10-06-2021

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...

Slow join - Alternative to join for linking calls from upstream applications

using Splunk Enterprise

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

Slow join - Alternative to join for linking calls from upstream applications

using Splunk Enterprise

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard