Splunk Cloud Platform

regex to capture the html tags in the raw data

Splunkerninja
Path Finder

Hi,

I have html tags like <p> <br> <a href="www.google/com target=_blank"> & so on in my raw data, I want to capture everything except these html tags . Please help me with regex

sample raw data

A flaw in the way Internet Explorer handles a specific HTTP request could allow arbitrary code to execute in the context of the logged-on user, should the
<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>

Tags (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rather than extract everything *except* the tags, why not remove the tags and keep what's left?

| rex mode=sed "s/\<[^\>]+>//g"
---
If this reply helps you, Karma would be appreciated.

splunkerninja1
Explorer

@richgalloway . This expression is not removing the tags from the raw data

 

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex mode=sed "s/\<[^\>]+>//g"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command defaults to the _raw field.  Other fields must be explicitly referenced.  The following works in my sandbox.

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex field=message mode=sed "s/\<[^\>]+>//g"

 

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...