Splunk Search

How to extract URL and text after a string?

firoagni
Engager

Hi,

I would like to extract fields from an unstructured data that contain multiple labels followed by its HTML href tag:

Sample events:

 

 

Change: <a href="https://xxyyzz.com/changes/12345">#12345</a> - Review: <a href="https://xxyyzz.com/reviews/7890">#7890</a>

Change: <a href="https://xxyyzz.com/changes/1345">#1345</a> - Review: <a href="https://xxyyzz.com/reviews/7891">#7891</a>

Review: <a href="https://zzyyyxxx/reviews/205657">205657</a>

 

 

I wish to get results for the above data as follows:

 

 

change_url                       change review_url                      review
https://xxyyzz.com/changes/12345 #12345 https://xxyyzz.com/reviews/7890 #7890 
https://xxyyzz.com/changes/1345  #1345  https://xxyyzz.com/reviews/7891 #7891
                                        https://zzyyyxxx/reviews/205657 #205657

 

 

Can someone suggest how can I use rex to obtain the above fields? 

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657

View solution in original post

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

you have to use two regexes because there's the possibility that a part of the event is missing, so please try this:

<your_search>
| rex "Change:\s*\<a href\=\"(?<change_url>[^\"]*)\"\>(?<change>[^\<]*)"
| rex "Review: <a href="(?<review_url>[^\"]*)\"\>(?<review>[^\>]*)"
| table change_url change review_url review

you can test these regexes at https://regex101.com/r/Vnsxl9/1 and https://regex101.com/r/Vnsxl9/2

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657
Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

In cybersecurity, defenders respond to threats. Architects design the systems that stop them.    As ...

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...