Getting Data In

regexp in Splunk

laurentiugrama
Explorer

I tried to find a solution in order to parse some URL to obtain the base but it seems that I cannot succeed.

For the between GET/POST and HTTP I want to return the baseurl as in the examples below

GET /gw/api/aaa/v1/ HTTP - to return /gw/api/aaa/v1
GET /gw/api/abc/v3 HTTP - to return /gw/api/abc/v3
POST /gw/api/cba/ HTTP - to return /gw/api/cba
POST /gw/transactions/swaggers/v2 HTTP - to return /gw/transactions/swaggers/v2
POST /gw/api/swaggers/v1/asd?dssa HTTP - to return /gw/api/swaggers/v1/asd
POST /api/swaggers/ HTTP - to return /api/swaggers
GET /api/cashAccountOpenings/v3/sadsa-123312-1312 HTTP - to return /api/cashAccountOpenings/v3

 

I added this examples to regex101.com to be easier to find a solution.

https://regex101.com/r/oLXtw8/1/

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

(?<method>\S+)\s(?<path>/[^?]*)(\?(?<query>\S*))?

Typing on my phone, didn't verify it 😉

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

sorry, what's the question?

the regex seems to be correct, even if I'd use an easier regex:

[GET|POST]\s(?<URL>.+)\s+(HTTP)

Ciao.

Giuseppe

laurentiugrama
Explorer

Thank you for the response but the expression returns URL not  baseurl

As I said, I trayed to obtain the baseurl from an url

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

sorry but I don't understand: what do you mean with baseurl?

the name of the field or a part of the URL?

If you want the name of the field you can modify the regex:

| rex "[GET|POST]\s(?<baseurl>.+)\s+(HTTP)"

if instead you want a part of the URL, e.g. the first two sections, you could use something like this:

| rex "[GET|POST]\s(?<baseurl>\/\w+\/\w+\/)\s+(HTTP)"

Ciao.

Giuseppe

laurentiugrama
Explorer

In the first post I explained what is the URL and what I want to obtain from regext

for first line the url is /gw/api/aaa/v1/ and the baseurl is /gw/api/aaa/v1

GET /gw/api/aaa/v1/ HTTP - to return /gw/api/aaa/v1
GET /gw/api/abc/v3 HTTP - to return /gw/api/abc/v3
POST /gw/api/cba/ HTTP - to return /gw/api/cba
POST /gw/transactions/swaggers/v2 HTTP - to return /gw/transactions/swaggers/v2
POST /gw/api/swaggers/v1/asd?dssa HTTP - to return /gw/api/swaggers/v1/asd
POST /api/swaggers/ HTTP - to return /api/swaggers
GET /api/cashAccountOpenings/v3/sadsa-123312-1312 HTTP - to return /api/cashAccountOpenings/v3

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

ok, sorry for the misunderstanding!

Please try this:

index=your_index
| rex field=URL "[GET|POST]\s(?<baseurl>.+)\s+(HTTP)"
| rex field=baseurl "(?<baseurl1>.+)\/$"
| rex field=baseurl "(?<baseurl2>.+)((\?\w+$)|(\/sadsa.*))"
| eval baseurl=coalesce(baseurl2,baseurl1,baseurl)
| table URL baseurl

Ciao.

Giuseppe

laurentiugrama
Explorer

Your solution it works well.

Do you think that it's possible to have a solution based on only one regexp iteration ?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @laurentiugrama,

probably it's possible, I'll try tomorrow (today I'm out!).

Ciao and happy splunking.

Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

Data Management Digest – May 2026

Welcome to the May 2026 edition of Data Management Digest!   As your trusted partner in data innovation, the ...