I'm like to collect two pieces of information from wildfly access logs in a single summary index: the number of average requests per minute by URI and avg/mode/max request duration also by URI.
Here are the pertinent fields logged in each wildfly event:
- _time
- method
- uri
- time_taken
- host
My first query looked like this:
sourcetype=wildfly _logs |bucket _time span=1m | sistats count request_count avg(time_taken) max(time_taken) mode(time_taken) median(time_taken) by uri host _time
However, this resulted in a lot of noise because uri in its raw form contains unique query strings. I'm only interested in caclulating time_taken stats for generic uris (http://www.example.com/somecontroller/someaction vs http://www.example.com/somecontroller/someaction/?QueryString1=foo)
So I try stripping off the query string portion of uri :
sourcetype=wildfly _logs | rex field=uri "^(?<uri_base_url>.+?)\?"|bucket _time span=1m | sistats count as request_count avg(time_taken) max(time_taken) mode(time_taken) median(time_taken) by uri_base_url host _time
This doesn't work either b/c request_count is under-counted because of the way I'm stripping off query string.
I know I can achieve what I'm after by splitting this summary search in two queries but it feels like this is something that can be achieved in a single query. Any pointers are appreciated.
Consider using the URL Toolbox app to parse the uri field for you. It uses an external command rather than rex
and probably handles edge cases much better.