Splunk Search

How to use rex to extract values from URLs into a field?

willamwar
Path Finder

Hello all,

From the following list

http://www.foo.com:80/main.html
http://www.foo.com:80/xe/journal/v1/book/nF1.jpg
http://www.goo.com:80/fiction/journal/07/40

Where the website is foo, find journal and extract /xe/journal/
So there are other websites, and not every target website has journal, and xe can be an number of characters

I have a working regex

 https?:\/\/.+?(?=foo\.com)[^\/]++(\/?[^\/]+\/(?=journal)journal\/?)

My issue is I can't figure out how to get rex to send the data to the 'data' variable.

| rex field=trimurl "https?:\/\/.+?(?=foo\.com)[^\/]++(?<data>.*)(\/?[^\/]+\/(?=journal)journal\/?)" 
0 Karma
1 Solution

gokadroid
Motivator

To get you the data in field data, rex part can be handled as follows:

rex field=trimurl "https?:\/\/.+?(foo\.com)[^\/]+(?<data>(\/[^\/]+){2}\/)"

See here the regex a work

If in field called data you specifically want the keyword journal together with variable number string called xe, where xe is one or more charaters long, like in the form /xe/journal/ then try this:

rex field=trimurl "https?:\/\/.+?(foo\.com)[^\/]+(?<data>(\/[^\/]+)\/journal\/)"

View solution in original post

woodcock
Esteemed Legend

Also, check out URL Toolbox:
https://splunkbase.splunk.com/app/2734/

0 Karma

gokadroid
Motivator

To get you the data in field data, rex part can be handled as follows:

rex field=trimurl "https?:\/\/.+?(foo\.com)[^\/]+(?<data>(\/[^\/]+){2}\/)"

See here the regex a work

If in field called data you specifically want the keyword journal together with variable number string called xe, where xe is one or more charaters long, like in the form /xe/journal/ then try this:

rex field=trimurl "https?:\/\/.+?(foo\.com)[^\/]+(?<data>(\/[^\/]+)\/journal\/)"

gokadroid
Motivator

oh damn...thanks...if you can accept the answer for it to be closed then that will help too...editing the answer as per your need and to correct my mistake..

0 Karma

willamwar
Path Finder

Thank you so much.

Could you fix the spelling of feld --> field so that others don't get an error and have to figure that out.

Get Updates on the Splunk Community!

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...

Auto-Injector for Everything Else: Making OpenTelemetry Truly Universal

You might have seen Splunk’s recent announcement about donating the OpenTelemetry Injector to the ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...