Splunk Search

How to extract URL and text after a string?

firoagni
Engager

Hi,

I would like to extract fields from an unstructured data that contain multiple labels followed by its HTML href tag:

Sample events:

 

 

Change: <a href="https://xxyyzz.com/changes/12345">#12345</a> - Review: <a href="https://xxyyzz.com/reviews/7890">#7890</a>

Change: <a href="https://xxyyzz.com/changes/1345">#1345</a> - Review: <a href="https://xxyyzz.com/reviews/7891">#7891</a>

Review: <a href="https://zzyyyxxx/reviews/205657">205657</a>

 

 

I wish to get results for the above data as follows:

 

 

change_url                       change review_url                      review
https://xxyyzz.com/changes/12345 #12345 https://xxyyzz.com/reviews/7890 #7890 
https://xxyyzz.com/changes/1345  #1345  https://xxyyzz.com/reviews/7891 #7891
                                        https://zzyyyxxx/reviews/205657 #205657

 

 

Can someone suggest how can I use rex to obtain the above fields? 

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657

View solution in original post

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

you have to use two regexes because there's the possibility that a part of the event is missing, so please try this:

<your_search>
| rex "Change:\s*\<a href\=\"(?<change_url>[^\"]*)\"\>(?<change>[^\<]*)"
| rex "Review: <a href="(?<review_url>[^\"]*)\"\>(?<review>[^\>]*)"
| table change_url change review_url review

you can test these regexes at https://regex101.com/r/Vnsxl9/1 and https://regex101.com/r/Vnsxl9/2

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657
Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...