Splunk Search

How to extract URL and text after a string?

firoagni
Engager

Hi,

I would like to extract fields from an unstructured data that contain multiple labels followed by its HTML href tag:

Sample events:

 

 

Change: <a href="https://xxyyzz.com/changes/12345">#12345</a> - Review: <a href="https://xxyyzz.com/reviews/7890">#7890</a>

Change: <a href="https://xxyyzz.com/changes/1345">#1345</a> - Review: <a href="https://xxyyzz.com/reviews/7891">#7891</a>

Review: <a href="https://zzyyyxxx/reviews/205657">205657</a>

 

 

I wish to get results for the above data as follows:

 

 

change_url                       change review_url                      review
https://xxyyzz.com/changes/12345 #12345 https://xxyyzz.com/reviews/7890 #7890 
https://xxyyzz.com/changes/1345  #1345  https://xxyyzz.com/reviews/7891 #7891
                                        https://zzyyyxxx/reviews/205657 #205657

 

 

Can someone suggest how can I use rex to obtain the above fields? 

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657

View solution in original post

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

you have to use two regexes because there's the possibility that a part of the event is missing, so please try this:

<your_search>
| rex "Change:\s*\<a href\=\"(?<change_url>[^\"]*)\"\>(?<change>[^\<]*)"
| rex "Review: <a href="(?<review_url>[^\"]*)\"\>(?<review>[^\>]*)"
| table change_url change review_url review

you can test these regexes at https://regex101.com/r/Vnsxl9/1 and https://regex101.com/r/Vnsxl9/2

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657
Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...