I need to do an analysis on API calls using logs, like avg, min, max, percentile99, percentil95, percentile99 response time, and also hits per second.
So, if I have events like below :
/data/users/1443 | 0.5 sec
/data/users/2232 | 0.2 sec
/data/users/39 | 0.2 sec
Expectation: I want them to be grouped like below, as per their API pattern :
proxy max_response_time
/data/users/{id} | 0.5 sec
These path variables (like {id}) can be numerical or can be a string with special characters
I have about 3000 such API patterns which have path variables in them, they can be categorized into 3 types, those that have a path variable only at the end, those that have 1 or more path variables only in the middle, and those that have 1 or more path variables in the middle as well as in the end. Note: there are no arguments after the API i.e. like /data/view/{name}/pagecount?age=x. There will be just the URI part
proxy method request_time
/data/users/{id} POST 0.046
/server/healthcheck/check/up GET 0.001
/data/commons/people/multi_upsert POST 0.141
/store/org/manufacturing/multi_read POST 0.363
/data/users/{id}/homepage/{name} POST 0.084
/data/view/{name}/pagecount PUT 0.043
Category 1 (path variable only at the end) :
/data/users/{id} POST 0.046
Category 2 (1 or more path variables only in the middle) :
/data/view/{name}/pagecount PUT 0.043
/data/view/{name}/details/{type}/pagecount PUT 0.043
Category 3 (1 or more path variables only in the middle and also at the end) :
/data/users/{id}/homepage/{name} POST 0.084
/data/users/{id}/homepage/{type}/details/{name} POST 0.084
Current Query :
index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access"
| bucket span=1s _time| stats count by env,tenant,uri_path,request_method,_time
I need the uri_path to be grouped as per the API patterns I have.
1 option is to add 3000 regex replace statements, like the one blow, in the query for each API pattern, but that makes query too heavy to parse, I tried something like this, for a sample pattern /api/data/users/{id} :
|rex mode=sed field=uri_path "s/\/api\/data\/users\/([^\/]+)$/\/api\/data\/users\/{id}/g"
I've done that sort of normalization using patterns within a case function. Like this:
index="*myindex*" host="*abc*" host!=*ftp* sourcetype!=infra* sourcetype!=linux* sourcetype = "nginx:plus:access"
| eval path=case(like(uri_path, "/data/user/%/homepage/%/details/%"),"/data/users/{id}/homepage/{type}/details/{name}",
like(uri_path, "/data/users/%/homepage/%"), "/data/users/{id}/homepage/",
like(uri_path, "/data/users/%"), "data/users/{id}",
like(uri_path, "/data/view/%/details/%/pagecount"), "/data/view/{id}/details/{type}/pagecount",
like(uri_path, "/data/view/%/pagecount"), "/data/view/{name}/pagecount",
1==1,uri_path)
| bin span=1s _time
| stats count by env,tenant,path,request_method,_time
Hi @richgalloway ,
Thanks for the reply, I tried using it like this, but it gives a warning that having a wildcard in the middle might be an issue in matching if there are special characters in place of it. Do you think there is a way to prevent any misses due to this and be sure that it will be working correctly 100%?
@richgalloway : I think using match() and exact regex will do the trick, I am testing it. Will let you know if it handles all the cases, I will mark this as the accepted answer.
Thanks!
Hi @kronite13
Unless you have unique identification field associated to each dynamic url pattern what you are trying to do is correct gives you the desired result. Having unique-id for each dynamic url is very rare in logs.
Other approach i could think of instead rex mode=sed, match the patterns of url's into categories and assign them a unique-value then group by unique-value.
Example pseudo code: you can use if, case like conditional stuff its upto coder
if url is like /data/user/something-1 then set categorie="url-1"
if url is like /data/users/some-id(/homepage/some-name then set categorie="url-2"
stats earliest(url) as url_sample , max(response_time)... by categorie
further change url_sample with format you want to display for readability - /data/user/{id}
---
An upvote would be appreciated and Accept solution if this reply helps !
Hi @kronite13 ,
You don't necessarily need to end up with 3,000 regexes but I think you will have to have some kind of reference to the exposed api endpoint that you then need to import, possibly into a multivalue field and check for the similarity in order to do the grouping you wish to do.
Hope this help.
Hi @efika ,
Thanks for your response!
Could you give an example for what you are saying?
What I am getting is, I need to add a new eval column with all the 3000 API endpoint patterns comma separated (Ex: /data/user/details/{id}, /data/places/{place_name}/street etc), and then check if the API endpoint in the event, matches with any of the API endpoint patterns which I have added in the eval column? Is that what you mean?