I have an api which has a number of endpoint, e.g., /health, /version, /specification and so on...
I have a query which extracts the response time from logs and creates stats as duration. Here is the query -
index=*my_index* namespace="my_namespace" "Logged Request" | rex field=log "OK:(?<duration>\d+)"| stats avg(duration) as Response_Time by log
Here's an example of generated log -
" 2021-08-23 20:36:15.627 [INFO ] [-19069] a.a.ActorSystemImpl - Logged Request:HttpMethod(POST):http://host/manager/resource_identifier/ids/getOrCreate/bulk?streaming=true&dscid=52wu9o-b5bf6995-38c5-4472-90d7-2f3edb780276:200 OK:2622 "
The query that I've shared extracts the duration from every log.
I need to find a way to extract the name of the endpoint and calculate the mean of duration of each endpoint?
Can some one help me with this, I am really new to splunk.
Add the following additional rex statement in to the search after the first rex
| rex field=log ":(?<url>http[^? ]*)"
or you could use this single reg statement that gets url, status (200), statusMessage(OK) and duration in a single pass
| rex field=log ":(?<url>https?:[^?: ]*)[^:]*:(?<status>\d+)\s?(?<statusMessage>[^:]*):(?<duration>\d+)"
The url will take all up to the query parameters ? - did you want to get only part of the path - if so, what rules would control which part you need?
Thanks for your response. I try to explain requirement with this example :-
http://host/manager/resource_identifier/ids/getOrCreate/bulk?streaming=true&dscid=LuSxrA-1c42bb5b-f862-4861-892f-69320e1a59e7:200 OK:22
http://host/manager/resource_identifier/ids/getOrCreate/bulk?dscid=LuSxrA-1c42bb5b-f862-4861-892f-69320e1a59e7:200 Created:78
http://host/manager/resource_identifier/storage/import:200 OK:100
http://host/manager/resource_identifier/storage/import:200 OK:20
I need to generate result like this -
getOrCreate 50 (mean of 22 and 78)
import 60 (mean of 100 and 20)
Is this possible with splunk?
Yes, everything is possible with Splunk 😀
You can paste this query into a search window and it will give you the results you want.
The important part is the rex and stats statement. Note that the query contains 3 rex statements and you only need ONE of these, but I have shown you different options, depending if you want to also split by the status, in case you get errors in the operation.
First rex = will extract status code as well as status message
Second rex = will extract status code
Third rex = just extracts operation
| makeresults
| eval _raw="log
http://host/manager/resource_identifier/ids/getOrCreate/bulk?streaming=true&dscid=LuSxrA-1c42bb5b-f862-4861-892f-69320e1a59e7:200 OK:22
http://host/manager/resource_identifier/ids/getOrCreate/bulk?dscid=LuSxrA-1c42bb5b-f862-4861-892f-69320e1a59e7:200 Created:78
http://host/manager/resource_identifier/storage/import:200 OK:100
http://host/manager/resource_identifier/storage/import:200 OK:20"
| multikv forceheader=1
| mvexpand log
| table log
| rex field=log "(?<url>https?://([^/]*/){4})(?<operation>[^/]*)[^:]*:(?<status>\d+)\s?(?<statusMessage>[^:]*):(?<duration>\d+)"
| rex field=log "(?<url>https?://([^/]*/){4})(?<operation>[^/]*)[^:]*:(?<status>\d+)[^:]*:(?<duration>\d+)"
| rex field=log "(?<url>https?://([^/]*/){4})(?<operation>[^/]*)([^:]*:){2}(?<duration>\d+)"
| stats avg(duration) AvgDuration by operation
| eval AvgDuration=round(AvgDuration)
Note that the regex is working on this principle
Hope this helps