Splunk Search

How to extract URL and text after a string?

firoagni
Engager

Hi,

I would like to extract fields from an unstructured data that contain multiple labels followed by its HTML href tag:

Sample events:

 

 

Change: <a href="https://xxyyzz.com/changes/12345">#12345</a> - Review: <a href="https://xxyyzz.com/reviews/7890">#7890</a>

Change: <a href="https://xxyyzz.com/changes/1345">#1345</a> - Review: <a href="https://xxyyzz.com/reviews/7891">#7891</a>

Review: <a href="https://zzyyyxxx/reviews/205657">205657</a>

 

 

I wish to get results for the above data as follows:

 

 

change_url                       change review_url                      review
https://xxyyzz.com/changes/12345 #12345 https://xxyyzz.com/reviews/7890 #7890 
https://xxyyzz.com/changes/1345  #1345  https://xxyyzz.com/reviews/7891 #7891
                                        https://zzyyyxxx/reviews/205657 #205657

 

 

Can someone suggest how can I use rex to obtain the above fields? 

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657

View solution in original post

Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

you have to use two regexes because there's the possibility that a part of the event is missing, so please try this:

<your_search>
| rex "Change:\s*\<a href\=\"(?<change_url>[^\"]*)\"\>(?<change>[^\<]*)"
| rex "Review: <a href="(?<review_url>[^\"]*)\"\>(?<review>[^\>]*)"
| table change_url change review_url review

you can test these regexes at https://regex101.com/r/Vnsxl9/1 and https://regex101.com/r/Vnsxl9/2

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Try

 

| kv pairdelim="-" kvdelim=":\s"
| foreach Change Review
    [rex field=<<FIELD>> "href=(?<<<FIELD>>_url>[^\>]+)>(?<<<FIELD>>_value>[^\<]+)"]

 

This is an emulation that you can play with and compare with real data

 

| makeresults
| eval data = mvappend("Change: <a href=\"https://xxyyzz.com/changes/12345\">#12345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7890\">#7890</a>",
"Change: <a href=\"https://xxyyzz.com/changes/1345\">#1345</a> - Review: <a href=\"https://xxyyzz.com/reviews/7891\">#7891</a>",
"Review: <a href=\"https://zzyyyxxx/reviews/205657\">205657</a>")
| mvexpand data
| rename data AS _raw
``` data emulation above ```

 

Put the two together, I get

ChangeChange_urlReviewReview_url
#12345https://xxyyzz.com/changes/12345#7890https://xxyyzz.com/reviews/7890
#1345https://xxyyzz.com/changes/1345#7891https://xxyyzz.com/reviews/7891
  205657https://zzyyyxxx/reviews/205657
Tags (2)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @firoagni ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Get Updates on the Splunk Community!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...