Getting Data In

regexp in Splunk

laurentiugrama
Explorer

I tried to find a solution in order to parse some URL to obtain the base but it seems that I cannot succeed.

For the between GET/POST and HTTP I want to return the baseurl as in the examples below

GET /gw/api/aaa/v1/ HTTP - to return /gw/api/aaa/v1
GET /gw/api/abc/v3 HTTP - to return /gw/api/abc/v3
POST /gw/api/cba/ HTTP - to return /gw/api/cba
POST /gw/transactions/swaggers/v2 HTTP - to return /gw/transactions/swaggers/v2
POST /gw/api/swaggers/v1/asd?dssa HTTP - to return /gw/api/swaggers/v1/asd
POST /api/swaggers/ HTTP - to return /api/swaggers
GET /api/cashAccountOpenings/v3/sadsa-123312-1312 HTTP - to return /api/cashAccountOpenings/v3

 

I added this examples to regex101.com to be easier to find a solution.

https://regex101.com/r/oLXtw8/1/

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

(?<method>\S+)\s(?<path>/[^?]*)(\?(?<query>\S*))?

Typing on my phone, didn't verify it 😉

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

sorry, what's the question?

the regex seems to be correct, even if I'd use an easier regex:

[GET|POST]\s(?<URL>.+)\s+(HTTP)

Ciao.

Giuseppe

laurentiugrama
Explorer

Thank you for the response but the expression returns URL not  baseurl

As I said, I trayed to obtain the baseurl from an url

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

sorry but I don't understand: what do you mean with baseurl?

the name of the field or a part of the URL?

If you want the name of the field you can modify the regex:

| rex "[GET|POST]\s(?<baseurl>.+)\s+(HTTP)"

if instead you want a part of the URL, e.g. the first two sections, you could use something like this:

| rex "[GET|POST]\s(?<baseurl>\/\w+\/\w+\/)\s+(HTTP)"

Ciao.

Giuseppe

laurentiugrama
Explorer

In the first post I explained what is the URL and what I want to obtain from regext

for first line the url is /gw/api/aaa/v1/ and the baseurl is /gw/api/aaa/v1

GET /gw/api/aaa/v1/ HTTP - to return /gw/api/aaa/v1
GET /gw/api/abc/v3 HTTP - to return /gw/api/abc/v3
POST /gw/api/cba/ HTTP - to return /gw/api/cba
POST /gw/transactions/swaggers/v2 HTTP - to return /gw/transactions/swaggers/v2
POST /gw/api/swaggers/v1/asd?dssa HTTP - to return /gw/api/swaggers/v1/asd
POST /api/swaggers/ HTTP - to return /api/swaggers
GET /api/cashAccountOpenings/v3/sadsa-123312-1312 HTTP - to return /api/cashAccountOpenings/v3

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

ok, sorry for the misunderstanding!

Please try this:

index=your_index
| rex field=URL "[GET|POST]\s(?<baseurl>.+)\s+(HTTP)"
| rex field=baseurl "(?<baseurl1>.+)\/$"
| rex field=baseurl "(?<baseurl2>.+)((\?\w+$)|(\/sadsa.*))"
| eval baseurl=coalesce(baseurl2,baseurl1,baseurl)
| table URL baseurl

Ciao.

Giuseppe

laurentiugrama
Explorer

Your solution it works well.

Do you think that it's possible to have a solution based on only one regexp iteration ?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

probably it's possible, I'll try tomorrow (today I'm out!).

Ciao and happy splunking.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

The OpenTelemetry Certified Associate (OTCA) Exam

What’s this OTCA exam? The Linux Foundation offers the OpenTelemetry Certified Associate (OTCA) credential to ...

From Manual to Agentic: Level Up Your SOC at Cisco Live

Welcome to the Era of the Agentic SOC   Are you tired of being a manual alert responder? The security ...

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

Welcome back to Splunk Classroom Chronicles, our ongoing series where we shine a light on what really happens ...