Splunk Search

Grouping URLs by their path variable pattern

kronite13
Explorer

I need to do an analysis on API calls using logs, like avg, min, max, percentile99, percentil95, percentile99 response time, and also hits per second.

So, if I have events like below :

/data/users/1443 | 0.5 sec
/data/users/2232 | 0.2 sec
/data/users/39 | 0.2 sec

Expectation: I want them to be grouped like below, as per their API pattern :

proxy              max_response_time
/
data/users/{id} | 0.5 sec

 

These path variables (like {id}) can be numerical or can be a string with special characters


I have about 3000 such API patterns which have path variables in them,  they can be categorized into 3 types, those that have a path variable only at the end, those that have 1 or more path variables only in the middle, and those that have 1 or more path variables in the middle as well as in the end. Note: there are no arguments after the API i.e. like /data/view/{name}/pagecount?age=x. There will be just the URI part

proxy                              method   request_time
/data/users/{id}                    POST    0.046
/server/healthcheck/check/up        GET     0.001
/data/commons/people/multi_upsert   POST    0.141
/store/org/manufacturing/multi_read POST    0.363
/data/users/{id}/homepage/{name}    POST    0.084
/data/view/{name}/pagecount         PUT     0.043

Category 1 (path variable only at the end) :
/data/users/{id}                    POST    0.046

Category 2 (1 or more path variables only in the middle) :
/data/view/{name}/pagecount                         PUT     0.043
/data/view/{name}/details/{type}/pagecount          PUT     0.043

Category 3 (1 or more path variables only in the middle and also at the end) :
/data/users/{id}/homepage/{name}    POST    0.084
/data/users/{id}/homepage/{type}/details/{name} POST    0.084

 

Current Query :

 

 

index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access" 
| bucket span=1s _time| stats count by env,tenant,uri_path,request_method,_time

 

 

 

I need the uri_path to be grouped as per the API patterns I have. 

 

1 option is to add 3000 regex replace statements, like the one blow, in the query for each API pattern, but that makes query too heavy to parse, I tried something like this, for a sample pattern /api/data/users/{id} :

 

|rex mode=sed field=uri_path "s/\/api\/data\/users\/([^\/]+)$/\/api\/data\/users\/{id}/g"

 

 

Labels (2)

richgalloway
SplunkTrust
SplunkTrust

I've done that sort of normalization using patterns within a case function.  Like this:

index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access" 
| eval path=case(like(uri_path, "/data/user/%/homepage/%/details/%"),"/data/users/{id}/homepage/{type}/details/{name}", 
like(uri_path, "/data/users/%/homepage/%"), "/data/users/{id}/homepage/",
like(uri_path, "/data/users/%"), "data/users/{id}",
like(uri_path, "/data/view/%/details/%/pagecount"), "/data/view/{id}/details/{type}/pagecount",
like(uri_path, "/data/view/%/pagecount"), "/data/view/{name}/pagecount",
1==1,uri_path)
| bin span=1s _time
| stats count by env,tenant,path,request_method,_time
---
If this reply helps you, Karma would be appreciated.
0 Karma

kronite13
Explorer

Hi @richgalloway ,

Thanks for the reply, I tried using it like this, but it gives a warning that having a wildcard in the middle might be an issue in matching if there are special characters in place of it. Do you think there is a way to prevent any misses due to this and be sure that it will be working correctly 100%?

0 Karma

kronite13
Explorer

@richgalloway  : I think using match() and exact regex will do the trick, I am testing it. Will let you know if it handles all the cases, I will mark this as the accepted answer.

Thanks!

0 Karma

venkatasri
SplunkTrust
SplunkTrust

Hi @kronite13 

Unless you have unique identification field associated to each dynamic url pattern what you are trying to do is correct gives you the desired result.  Having unique-id for each dynamic url is very rare in logs.

  • Know your url patterns upfront
  • replace the dynamic portion of url using rex sed mode
  • Apply stats aggr function max min avg on response_time

 

Other approach i could think of instead rex mode=sed, match the patterns of url's  into categories and assign them a unique-value  then group by unique-value.

Example pseudo code: you can use if, case like conditional stuff its upto coder

if url is like /data/user/something-1 then set categorie="url-1"

if url is like /data/users/some-id(/homepage/some-name then set categorie="url-2"

stats earliest(url) as url_sample ,  max(response_time)... by categorie

further change url_sample with format you want to display  for readability - /data/user/{id}

---

An upvote would  be appreciated and Accept solution if this reply helps !

0 Karma

efika
Communicator

Hi @kronite13 ,

You don't necessarily need to end up with 3,000 regexes but I think you will have to have some kind of reference to the exposed api endpoint that you then need to import, possibly into a multivalue field and check for the similarity in order to do the grouping you wish to do.

 

Hope this help.

0 Karma

kronite13
Explorer

Hi @efika ,

Thanks for your response!
Could you give an example for what you are saying?


What I am getting is, I need to add a new eval column with all the 3000 API endpoint patterns comma separated (Ex: /data/user/details/{id}, /data/places/{place_name}/street etc),  and then check if the API endpoint in the event, matches with any of the API endpoint patterns which I have added in the eval column? Is that what you mean?

0 Karma
Get Updates on the Splunk Community!

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...

What's New in Splunk Observability Cloud and Splunk AppDynamics - May 2025

This month, we’re delivering several new innovations in Splunk Observability Cloud and Splunk AppDynamics ...