Splunk Search

How to extract URL and text after a string?

firoagni
Engager

Hi,

I would like to extract fields from an unstructured data that contain multiple labels followed by its HTML href tag:

Sample events:

 

 

Change: <a href="https://xxyyzz.com/changes/12345">#12345</a> - Review: <a href="https://xxyyzz.com/reviews/7890">#7890</a>

Change: <a href="https://xxyyzz.com/changes/1345">#1345</a> - Review: <a href="https://xxyyzz.com/reviews/7891">#7891</a>

Review: <a href="https://zzyyyxxx/reviews/205657">205657</a>

 

 

I wish to get results for the above data as follows:

 

 

change_url                       change review_url                      review
https://xxyyzz.com/changes/12345 #12345 https://xxyyzz.com/reviews/7890 #7890 
https://xxyyzz.com/changes/1345  #1345  https://xxyyzz.com/reviews/7891 #7891
                                        https://zzyyyxxx/reviews/205657 #205657

 

 

Can someone suggest how can I use rex to obtain the above fields? 

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657

View solution in original post

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

you have to use two regexes because there's the possibility that a part of the event is missing, so please try this:

<your_search>
| rex "Change:\s*\<a href\=\"(?<change_url>[^\"]*)\"\>(?<change>[^\<]*)"
| rex "Review: <a href="(?<review_url>[^\"]*)\"\>(?<review>[^\>]*)"
| table change_url change review_url review

you can test these regexes at https://regex101.com/r/Vnsxl9/1 and https://regex101.com/r/Vnsxl9/2

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657
Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...