Hello Everyone,
Below is the set of the log response pattern:
"message":{"input":"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\" 200 636 8080 13 ms"}
"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\" 200 1855 8080 10 ms"}
"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\" 200 1855 8080 10 ms"}
"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\" 200 1855 8080 10 ms"}
From the above, I am interested to extract only the orange highlighted string eg:
GET /shopping/carts/v1/<ending with any id alone> HTTP
I tried with below splunk query as intermediate step to extract the urls:
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner | rex field=message.input "(?<servicename>(?:[^\"]|\"\")*HTTP)" | dedup servicename | stats count by servicename
servicename is pre-extracted variable
But this query returns the all pattern.
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP (I need only this)
GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP
GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP
Please help
(Somehow my previous rely was lost.) I try not to reinvent regex if there exists robust, vendor supported options. For message.input, it is standard NCSA/Apache access log. Splunk provides several built-in standard extractions. I'll use access-extractions as example.
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename _raw as temp, message.input as _raw
| extract access-extractions
This will give you
bytes | clientip | cookie | file | ident | method | module | other | referer | referer_domain | req_time | requestedPoint | root | status | uri | uri_domain | uri_path | uri_query | user | useragent | version |
636 | 999.111.000.999 | summary | - | GET | 8080 13 ms | 06/Apr/2023:05:45:51 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | 83h3h331-g494-28h4-yyw7-dq123123123d | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | product | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | CJS | - | GET | ONLINE_BOOKING | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | DESTINATION | location-context | 200 | /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION | /location-context/stations/v1/CJS | module=ONLINE_BOOKING&requestedPoint=DESTINATION | - | HTTP/1.1 |
So, your ask is to get method + uri_path + version for select events. Speaking of select, you should do the selection in the main search. That's why the following code adds /shopping/carts/.
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
"GET /shopping/carts/"
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5 ``` nothing after cart ID ```
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)
The end result, of course, is
ask |
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP |
Obviously, the final segment of the concatenation above could as well be hard coded "HTTP". But I wanted to highlight how unusual it is to just take the protocol without actual version, because version makes a difference in applications.
Anyway, the following is an emulation that you can play and compare with real data.
| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner```
| where match('message.input', "GET /shopping/carts/")
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```
(Somehow my previous rely was lost.) I try not to reinvent regex if there exists robust, vendor supported options. For message.input, it is standard NCSA/Apache access log. Splunk provides several built-in standard extractions. I'll use access-extractions as example.
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename _raw as temp, message.input as _raw
| extract access-extractions
This will give you
bytes | clientip | cookie | file | ident | method | module | other | referer | referer_domain | req_time | requestedPoint | root | status | uri | uri_domain | uri_path | uri_query | user | useragent | version |
636 | 999.111.000.999 | summary | - | GET | 8080 13 ms | 06/Apr/2023:05:45:51 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | 83h3h331-g494-28h4-yyw7-dq123123123d | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | product | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | CJS | - | GET | ONLINE_BOOKING | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | DESTINATION | location-context | 200 | /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION | /location-context/stations/v1/CJS | module=ONLINE_BOOKING&requestedPoint=DESTINATION | - | HTTP/1.1 |
So, your ask is to get method + uri_path + version for select events. Speaking of select, you should do the selection in the main search. That's why the following code adds /shopping/carts/.
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
"GET /shopping/carts/"
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5 ``` nothing after cart ID ```
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)
The end result, of course, is
ask |
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP |
Obviously, the final segment of the concatenation above could as well be hard coded "HTTP". But I wanted to highlight how unusual it is to just take the protocol without actual version, because version makes a difference in applications.
Anyway, the following is an emulation that you can play and compare with real data.
| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner```
| where match('message.input', "GET /shopping/carts/")
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```
@ITWhisperer the runanywhere example works as expected.
Guess I have more pattern which I missed to include and that is returning as well. Hence I updated the runanywhere example as below:
| makeresults
| fields - _time
| eval _raw="\"message\":{\"input\":\"192.168.62.10 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/e5aa581b-ac7a-40f5-a8da-8ab5cb51039c/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55 HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\": {\"input\": \"192.168.62.10 - - [15/Apr/2023:03:32:22 +0000] \\\"GET /shopping/carts/v1/152c1299-e598-40d3-8934-29f6662bbb98?productType=ALL HTTP/1.1\\\" 200 1828 8080 13 ms\"}"
| multikv noheader=t
| fields _raw
``` the lines above just set up the example events ```
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"
Output:
Try this:
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ?]+\sHTTP)"
I would not invent regex when there are robust, vendor supported ones. Your message.input is a standard access log from NSCA/Apache httpd. So, leverage access-extractions that comes with Splunk itself.
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename message.input as _raw
| extract access-extractions
This should give you these fields from illustrated data
bytes | clientip | cookie | file | ident | method | module | other | referer | reference_domain | reg_time | requestedPoint | root | status | uri | uri_domain | uri_path | uri_query | user | useragent | version |
636 | 999.111.000.999 | summary | - | GET | 8080 13 ms | 06/Apr/2023:05:45:51 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | 83h3h331-g494-28h4-yyw7-dq123123123d | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | product | - | GET | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | shopping | 200 | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product | - | HTTP/1.1 | ||||||||
1855 | 999.111.000.999 | CJS | - | GET | ONLINE_BOOKING | 8080 10 ms | 06/Apr/2023:04:08:13 +0000 | DESTINATION | location-context | 200 | /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION | /location-context/stations/v1/CJS | module=ONLINE_BOOKING&requestedPoint=DESTINATION | - | HTTP/1.1 |
What you are saying is that you only want method + uri_path + version for some select events. So, work from these extracted fields and build what you asked for. BTW, you should eliminate those that don't contain shopping carts first. So, that's how I built it:
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"
| rename _raw AS temp, message.input AS _raw ``` in case original _raw is needed later ```
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)
Note the final string in the above concatenation could as well be "HTTP". But I want to highlight how unusual it is to want the string HTTP without actual version, because HTTP version makes a difference in applications. Anyway, using your illustrated data, my emulation gives
ask |
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP |
The following is my emulation that you can play and compare with real data
| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
| fields - _time data
| rename message.input as _raw
| search "GET /shopping/carts/"
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```
| rex "GET\s+\/shopping\/carts\/v\d+\/(?<justAcart>[^\/]+)\s+HTTP"
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"
@ITWhisperer unfortunately it is still returning all patterns:
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner | rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"
Interesting! Here is a runanywhere example showing it working.
| makeresults
| fields - _time
| eval _raw="\"message\":{\"input\":\"192.168.62.10 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/e5aa581b-ac7a-40f5-a8da-8ab5cb51039c/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55 HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}"
| multikv noheader=t
| fields _raw
``` the lines above just set up the example events ```
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"
This begs the question, what is it about the events that are being returned which causes them to have the field extracted. Unless you share the actual events, you will have to figure that out for yourself!