Splunk Cloud Platform

regex to capture the html tags in the raw data

Splunkerninja
Path Finder

Hi,

I have html tags like <p> <br> <a href="www.google/com target=_blank"> & so on in my raw data, I want to capture everything except these html tags . Please help me with regex

sample raw data

A flaw in the way Internet Explorer handles a specific HTTP request could allow arbitrary code to execute in the context of the logged-on user, should the
<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>

Tags (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rather than extract everything *except* the tags, why not remove the tags and keep what's left?

| rex mode=sed "s/\<[^\>]+>//g"
---
If this reply helps you, Karma would be appreciated.

splunkerninja1
Explorer

@richgalloway . This expression is not removing the tags from the raw data

 

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex mode=sed "s/\<[^\>]+>//g"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command defaults to the _raw field.  Other fields must be explicitly referenced.  The following works in my sandbox.

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex field=message mode=sed "s/\<[^\>]+>//g"

 

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...