Splunk Enterprise

Slow join - Alternative to join for linking calls from upstream applications

vbarra
Engager

Hi All,

I know the topic is quite extensively documented in several posts within splunk community but I could not really figure out what is best to apply in the case below.

some context about architecture in use
We have basically 3 layers (so index) which a FE call goes through:
  • apigee 
  • mise (microservices)
  • eCommerce applicaiton

In other words, a call initiated by FE is routed to apigee from where it goes to some microservice which in turn might call the eCommerce application. The calls are uniquely identified by a requestId so given an apigee call I can get exactly which are the related calls to eCommerce because of the requestId.


Splunk dashboard
I'm building a dashboard where:
  • panel 1: I list all apigee calls grouped by common url to get some stats out out of it (so far so good). Something like:  
    > oce/soe/orders/v1/*/billingProfile (normalized url), then display the columns: count, avg(duration), perc90(duration)
    > oce/soe/orders/v1/*/delivery-method (normalized url), then display the columns: count, avg(duration), perc90(duration)
    > ...
  • panel 2: given the focus on an one apigee "normalized" call from panel one I list all related eCommerce calls. The goal is to grab some stats over such ep calls, grouping those by common urls and then average duration and taking perc90. To make this working I do use a join but it's very slow. Something like, given the focus on billingProfile apigee call above:

ecommerceapp/billinginfo/store/*/ (normalized url),  then display the columns: count, avg(duration), perc90(duration)

 


I want to ask if you see any other way to reach the same goal without a join or if you have any generally hint to improve the performance. 

Thanks in advance,
Vincenzo

 
I report below the index search I'm currently using highlighting the part of interest:

 

 

 

 

 

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" | where isnotnull( SCS_Request_ID) | rex field=_raw "(?<operation>(?<![\w\d])(GET|POST|PUT|DELETE)(?![\w\d]))" | rex field=_raw "(?<=[/^(POST|GET|PUT|DELETE)$/] )(?<service>[\/a-zA-Z\.].+?(?=HTTP))" | rex field=_raw "(?<=Duration=)(?<duration>[0-9]*)" | eval temp=split(service,"/")  | eval field1=mvindex(temp,1) | eval field2=mvindex(temp,2) | eval field3=mvindex(temp,3) | eval field4=mvindex(temp,4) | eval field4=if(like(field4,"%=%") OR like(field4,"%?%"), "*", field4) | eval field5=mvindex(temp,5)| eval field5=if(like(field5, "%=%") OR like(field5,"%?%"), "*", field5) | eval url_short=if(isnull(field5),field1."/".field2."/".field3."/".field4."/", field1."/".field2."/".field3."/".field4."/".field5)  | eval fullName = operation." ".url_short | table SCS_Request_ID, operation, url_short, duration | join SCS_Request_ID [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) ] | stats count, avg(duration) as avg_, perc90(duration) as perc90_ by operation, url_short | eval avg_=round(avg_,2) | eval perc90_=round(perc90_,2)<div> </div>

 

 

 

 

 

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...

View solution in original post

0 Karma

vbarra
Engager

thanks a lot @ITWhisperer , indeed it worked (sorry it was obvious but I was somehow thinking to join only)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

In your example, you are only getting the SCS_Request_ID from apigee in the join so it is effectively just being a filter. As such, would adding it to the initial search not do the same thing?

index=ecommerce_prod (namespace::prod-onl) (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$") "Status=20*" [ search index="apigee" (earliest="$timeRange1.earliest$" latest="$timeRange1.latest$")  status="20*" | rename tgrq_h_scs-request-id as SCS_Request_ID | table SCS_Request_ID | where isnotnull( SCS_Request_ID) | format] | ...
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...