Splunk Search

Grouping URLs by their path variable pattern

kronite13
Explorer

I need to do an analysis on API calls using logs, like avg, min, max, percentile99, percentil95, percentile99 response time, and also hits per second.

So, if I have events like below :

/data/users/1443 | 0.5 sec
/data/users/2232 | 0.2 sec
/data/users/39 | 0.2 sec

Expectation: I want them to be grouped like below, as per their API pattern :

proxy              max_response_time
/
data/users/{id} | 0.5 sec

 

These path variables (like {id}) can be numerical or can be a string with special characters


I have about 3000 such API patterns which have path variables in them,  they can be categorized into 3 types, those that have a path variable only at the end, those that have 1 or more path variables only in the middle, and those that have 1 or more path variables in the middle as well as in the end. Note: there are no arguments after the API i.e. like /data/view/{name}/pagecount?age=x. There will be just the URI part

proxy                              method   request_time
/data/users/{id}                    POST    0.046
/server/healthcheck/check/up        GET     0.001
/data/commons/people/multi_upsert   POST    0.141
/store/org/manufacturing/multi_read POST    0.363
/data/users/{id}/homepage/{name}    POST    0.084
/data/view/{name}/pagecount         PUT     0.043

Category 1 (path variable only at the end) :
/data/users/{id}                    POST    0.046

Category 2 (1 or more path variables only in the middle) :
/data/view/{name}/pagecount                         PUT     0.043
/data/view/{name}/details/{type}/pagecount          PUT     0.043

Category 3 (1 or more path variables only in the middle and also at the end) :
/data/users/{id}/homepage/{name}    POST    0.084
/data/users/{id}/homepage/{type}/details/{name} POST    0.084

 

Current Query :

 

 

index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access" 
| bucket span=1s _time| stats count by env,tenant,uri_path,request_method,_time

 

 

 

I need the uri_path to be grouped as per the API patterns I have. 

 

1 option is to add 3000 regex replace statements, like the one blow, in the query for each API pattern, but that makes query too heavy to parse, I tried something like this, for a sample pattern /api/data/users/{id} :

 

|rex mode=sed field=uri_path "s/\/api\/data\/users\/([^\/]+)$/\/api\/data\/users\/{id}/g"

 

 

Labels (2)

richgalloway
SplunkTrust
SplunkTrust

I've done that sort of normalization using patterns within a case function.  Like this:

index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access" 
| eval path=case(like(uri_path, "/data/user/%/homepage/%/details/%"),"/data/users/{id}/homepage/{type}/details/{name}", 
like(uri_path, "/data/users/%/homepage/%"), "/data/users/{id}/homepage/",
like(uri_path, "/data/users/%"), "data/users/{id}",
like(uri_path, "/data/view/%/details/%/pagecount"), "/data/view/{id}/details/{type}/pagecount",
like(uri_path, "/data/view/%/pagecount"), "/data/view/{name}/pagecount",
1==1,uri_path)
| bin span=1s _time
| stats count by env,tenant,path,request_method,_time
---
If this reply helps you, Karma would be appreciated.
0 Karma

kronite13
Explorer

Hi @richgalloway ,

Thanks for the reply, I tried using it like this, but it gives a warning that having a wildcard in the middle might be an issue in matching if there are special characters in place of it. Do you think there is a way to prevent any misses due to this and be sure that it will be working correctly 100%?

0 Karma

kronite13
Explorer

@richgalloway  : I think using match() and exact regex will do the trick, I am testing it. Will let you know if it handles all the cases, I will mark this as the accepted answer.

Thanks!

0 Karma

venkatasri
SplunkTrust
SplunkTrust

Hi @kronite13 

Unless you have unique identification field associated to each dynamic url pattern what you are trying to do is correct gives you the desired result.  Having unique-id for each dynamic url is very rare in logs.

  • Know your url patterns upfront
  • replace the dynamic portion of url using rex sed mode
  • Apply stats aggr function max min avg on response_time

 

Other approach i could think of instead rex mode=sed, match the patterns of url's  into categories and assign them a unique-value  then group by unique-value.

Example pseudo code: you can use if, case like conditional stuff its upto coder

if url is like /data/user/something-1 then set categorie="url-1"

if url is like /data/users/some-id(/homepage/some-name then set categorie="url-2"

stats earliest(url) as url_sample ,  max(response_time)... by categorie

further change url_sample with format you want to display  for readability - /data/user/{id}

---

An upvote would  be appreciated and Accept solution if this reply helps !

0 Karma

efika
Communicator

Hi @kronite13 ,

You don't necessarily need to end up with 3,000 regexes but I think you will have to have some kind of reference to the exposed api endpoint that you then need to import, possibly into a multivalue field and check for the similarity in order to do the grouping you wish to do.

 

Hope this help.

0 Karma

kronite13
Explorer

Hi @efika ,

Thanks for your response!
Could you give an example for what you are saying?


What I am getting is, I need to add a new eval column with all the 3000 API endpoint patterns comma separated (Ex: /data/user/details/{id}, /data/places/{place_name}/street etc),  and then check if the API endpoint in the event, matches with any of the API endpoint patterns which I have added in the eval column? Is that what you mean?

0 Karma
Get Updates on the Splunk Community!

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...