Splunk Search

How to generate a regular expression on a URL to capture a resource path and end on a optional parameter?

bcatwork
Path Finder

Hi all, I am looking for some help for the following use case.

I have a series of endpoints represented by full URLs logged across a few sources, of which i am trying to normalize to then aggregate on.

I am looking for the resource path, less any optional params. To say, I want to capture everything after the [//] double slash, domain name, first [/] singular slash and end that capture on an optional param [?]

https://answers.splunk.com/answers/ask.html?foo=bar --> Becomes --> answers/ask.html
https://answers.splunk.com/answers/ask.html --> Becomes --> answers/ask.html
http://docs.splunk.com/Documentation --> Becomes --> Documentation

0 Karma

woodcock
Esteemed Legend

Like this:
|makeresults
| eval URL = mvappend("https://answers.splunk.com/answers/ask.html?foo=bar",
"https://answers.splunk.com/answers/ask.html",
"http://docs.splunk.com/Documentation")
| rex field=URL mode=sed "s/\?.*$//"

Also, there is an app that does this kind of thing:
https://splunkbase.splunk.com/app/2734

0 Karma

tfujita_splunk
Splunk Employee
Splunk Employee

I created a Splunk Macros for regular expressions for URIs or URLs.

Definitions and usages are in an article below.
https://qiita.com/Joh256/private/659ef65897905890ef99

I also put them in an add-on below.
https://splunkbase.splunk.com/app/6595

Usage_URIRegex.png

 

0 Karma

gokadroid
Motivator

If say your url is already in the field myUrl, then try this:

yourBasequery to get myUrl field
|rex field=myUrl "http(s)*\:\/\/([^\/]+)\/(?<uri>[^\?\s]+)"

OR, try on _raw

yourBasequery to get url field
|rex field=_raw "http(s)*\:\/\/([^\/]+)\/(?<uri>[^\?\s]+)"
0 Karma

pgreer_splunk
Splunk Employee
Splunk Employee
rex field=urlField "^[^\/]+\/\/[^\/]+\/(?P<wantedField>[^\s;]+).*"

should pick up all three of your example use cases into the new extracted field named 'wantedField'

0 Karma

somesoni2
Revered Legend

You can try the replace OR rex-sed method to update the url field per your guideline. (sample run anywhere sample)

| gentimes start=-1 | eval url="https://answers.splunk.com/answers/ask.html?foo=bar https://answers.splunk.com/answers/ask.html http://docs.splunk.com/Documentation"; | makemv url | table url | mvexpand url 
| eval url=replace(url,"^[^\/]+\/\/[^\/]+\/([^\s\?;=]+).*","\1") | ...rest of the query

OR

| gentimes start=-1 | eval url="https://answers.splunk.com/answers/ask.html?foo=bar https://answers.splunk.com/answers/ask.html http://docs.splunk.com/Documentation"; | makemv url | table url | mvexpand url 
| rex mode=sed field=url "s/^[^\/]+\/\/[^\/]+\/([^\s\?;=]+).*/\1/" | ...rest of the query

aaraneta_splunk
Splunk Employee
Splunk Employee

Hi @bcatwork - I saw that you up-voted this answer from somesoni2. If this answer did help to solve your question, please don't forget to click "Accept" below the answer to close out this post. If not, please leave a comment with more feedback. Thanks!

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...