Splunk Search

How to extract particular pattern text from its various possible trailing text pattern?

super_edition
Path Finder

Hello Everyone,

Below is the set of the log response pattern:

"message":{"input":"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\" 200 636 8080 13 ms"}


"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\" 200 1855 8080 10 ms"}


"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\" 200 1855 8080 10 ms"}

"message":{"input":"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\" 200 1855 8080 10 ms"}

From the above, I am interested to extract only the orange highlighted string eg: 

GET /shopping/carts/v1/<ending with any id alone> HTTP

I tried with below splunk query as intermediate step to extract the urls:

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner | rex field=message.input "(?<servicename>(?:[^\"]|\"\")*HTTP)" | dedup servicename | stats count by servicename

servicename is pre-extracted variable

But this query returns the all pattern.

GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP (I need only this)
GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP
GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP

Please help

Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try this:

| rex "\"(?<url>GET /shopping/carts/v1/[^/ ?]+\sHTTP)"

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

(Somehow my previous rely was lost.) I try not to reinvent regex if there exists robust, vendor supported options.  For message.input, it is standard NCSA/Apache access log.  Splunk provides several built-in standard extractions.  I'll use access-extractions as example.

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename _raw as temp, message.input as _raw
| extract access-extractions

 

This will give you

bytesclientipcookiefileidentmethodmoduleotherrefererreferer_domainreq_timerequestedPointrootstatusuriuri_domainuri_pathuri_queryuseruseragentversion
636999.111.000.999 summary-GET 8080 13 ms  06/Apr/2023:05:45:51 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary - HTTP/1.1
1855999.111.000.999 83h3h331-g494-28h4-yyw7-dq123123123d-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d - HTTP/1.1
1855999.111.000.999 product-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product - HTTP/1.1
1855999.111.000.999 CJS-GETONLINE_BOOKING8080 10 ms  06/Apr/2023:04:08:13 +0000DESTINATIONlocation-context200/location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION /location-context/stations/v1/CJSmodule=ONLINE_BOOKING&requestedPoint=DESTINATION- HTTP/1.1

So, your ask is to get method + uri_path + version for select events.  Speaking of select, you should do the selection in the main search.  That's why the following code adds /shopping/carts/.

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
"GET /shopping/carts/"
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5 ``` nothing after cart ID ```
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)

 

The end result, of course, is

ask
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP

Obviously, the final segment of the concatenation above could as well be hard coded "HTTP".  But I wanted to highlight how unusual it is to just take the protocol without actual version, because version makes a difference in applications.

Anyway, the following is an emulation that you can play and compare with real data.

 

| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner```
| where match('message.input', "GET /shopping/carts/")
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```

 

 

 

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

(Somehow my previous rely was lost.) I try not to reinvent regex if there exists robust, vendor supported options.  For message.input, it is standard NCSA/Apache access log.  Splunk provides several built-in standard extractions.  I'll use access-extractions as example.

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename _raw as temp, message.input as _raw
| extract access-extractions

 

This will give you

bytesclientipcookiefileidentmethodmoduleotherrefererreferer_domainreq_timerequestedPointrootstatusuriuri_domainuri_pathuri_queryuseruseragentversion
636999.111.000.999 summary-GET 8080 13 ms  06/Apr/2023:05:45:51 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary - HTTP/1.1
1855999.111.000.999 83h3h331-g494-28h4-yyw7-dq123123123d-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d - HTTP/1.1
1855999.111.000.999 product-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product - HTTP/1.1
1855999.111.000.999 CJS-GETONLINE_BOOKING8080 10 ms  06/Apr/2023:04:08:13 +0000DESTINATIONlocation-context200/location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION /location-context/stations/v1/CJSmodule=ONLINE_BOOKING&requestedPoint=DESTINATION- HTTP/1.1

So, your ask is to get method + uri_path + version for select events.  Speaking of select, you should do the selection in the main search.  That's why the following code adds /shopping/carts/.

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
"GET /shopping/carts/"
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5 ``` nothing after cart ID ```
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)

 

 The end result, of course, is

ask
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP

Obviously, the final segment of the concatenation above could as well be hard coded "HTTP".  But I wanted to highlight how unusual it is to just take the protocol without actual version, because version makes a difference in applications.

Anyway, the following is an emulation that you can play and compare with real data.

 

| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner```
| where match('message.input', "GET /shopping/carts/")
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```

 

 

0 Karma

super_edition
Path Finder

@ITWhisperer  the runanywhere example works as expected.

Guess I have more pattern which I missed to include and that is returning as well. Hence I updated the runanywhere example as below:

 

| makeresults
| fields - _time
| eval _raw="\"message\":{\"input\":\"192.168.62.10 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/e5aa581b-ac7a-40f5-a8da-8ab5cb51039c/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55 HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\": {\"input\": \"192.168.62.10 - - [15/Apr/2023:03:32:22 +0000] \\\"GET /shopping/carts/v1/152c1299-e598-40d3-8934-29f6662bbb98?productType=ALL HTTP/1.1\\\" 200 1828 8080 13 ms\"}"
| multikv noheader=t
| fields _raw
``` the lines above just set up the example events ```
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"

 

Output:

super_edition_0-1681531171281.png

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try this:

| rex "\"(?<url>GET /shopping/carts/v1/[^/ ?]+\sHTTP)"

super_edition
Path Finder

@ITWhisperer  - Thanks.  It is now returning the expected pattern alone.

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

I would not invent regex when there are robust, vendor supported ones.  Your message.input is a standard access log from NSCA/Apache httpd.  So, leverage access-extractions that comes with Splunk itself.

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner
| rename message.input as _raw
| extract access-extractions

 

This should give you these fields from illustrated data

bytesclientipcookiefileidentmethodmoduleotherrefererreference_domainreg_timerequestedPointrootstatusuriuri_domainuri_pathuri_queryuseruseragentversion
636999.111.000.999 summary-GET 8080 13 ms  06/Apr/2023:05:45:51 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary - HTTP/1.1
1855999.111.000.999 83h3h331-g494-28h4-yyw7-dq123123123d-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d - HTTP/1.1
1855999.111.000.999 product-GET 8080 10 ms  06/Apr/2023:04:08:13 +0000 shopping200/shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product - HTTP/1.1
1855999.111.000.999 CJS-GETONLINE_BOOKING8080 10 ms  06/Apr/2023:04:08:13 +0000DESTINATIONlocation-context200/location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION /location-context/stations/v1/CJSmodule=ONLINE_BOOKING&requestedPoint=DESTINATION- HTTP/1.1

What you are saying is that you only want method + uri_path + version for some select events.  So, work from these extracted fields and build what you asked for.  BTW, you should eliminate those that don't contain shopping carts first.  So, that's how I built it:

 

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"
| rename _raw AS temp, message.input AS _raw ``` in case original _raw is needed later ```
| extract access-extractions
| where mvcount(split(uri_path, "/")) = 5
| eval ask = method . " " . uri_path . " " . mvindex(split(version, "/"), 0)

 

Note the final string in the above concatenation could as well be "HTTP".  But I want to highlight how unusual it is to want the string HTTP without actual version, because HTTP version makes a difference in applications.  Anyway, using your illustrated data, my emulation gives

ask
GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP

 

The following is my emulation that you can play and compare with real data

 

| makeresults
| eval data = split("{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/83h3h331-g494-28h4-yyw7-dq123123123d HTTP/1.1\\\" 200 1855 8080 10 ms\"}}
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/73737373-j3j3-8djd-jdjd-kejdjehi3nej/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}{
{\"message\":{\"input\":\"999.111.000.999 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}}", "
")
| mvexpand data
| spath input=data
| fields - _time data
| rename message.input as _raw
| search "GET /shopping/carts/"
``` the agove emulates search
index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner "GET /shopping/carts/"```

 

0 Karma

woodcock
Esteemed Legend

| rex "GET\s+\/shopping\/carts\/v\d+\/(?<justAcart>[^\/]+)\s+HTTP"

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"
0 Karma

super_edition
Path Finder

@ITWhisperer  unfortunately it is still returning all patterns:

index=my_index openshift_cluster="cluster009" sourcetype=openshift_logs openshift_namespace=my_ns openshift_container_name=contaner | rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Interesting! Here is a runanywhere example showing it working.

| makeresults
| fields - _time
| eval _raw="\"message\":{\"input\":\"192.168.62.10 - - [06/Apr/2023:05:45:51 +0000] \\\"GET /shopping/carts/v1/e5aa581b-ac7a-40f5-a8da-8ab5cb51039c/summary HTTP/1.1\\\" 200 636 8080 13 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55 HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /shopping/carts/v1/734b2f55-c304-49a5-baa9-8e9994495b55/product HTTP/1.1\\\" 200 1855 8080 10 ms\"}
\"message\":{\"input\":\"192.168.54.47 - - [06/Apr/2023:04:08:13 +0000] \\\"GET /location-context/stations/v1/CJS?module=ONLINE_BOOKING&requestedPoint=DESTINATION HTTP/1.1\\\" 200 1855 8080 10 ms\"}"
| multikv noheader=t
| fields _raw
``` the lines above just set up the example events ```
| rex "\"(?<url>GET /shopping/carts/v1/[^/ ]+\sHTTP)"

This begs the question, what is it about the events that are being returned which causes them to have the field extracted. Unless you share the actual events, you will have to figure that out for yourself!

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...