Splunk Cloud Platform

regex to capture the html tags in the raw data

Splunkerninja
Path Finder

Hi,

I have html tags like <p> <br> <a href="www.google/com target=_blank"> & so on in my raw data, I want to capture everything except these html tags . Please help me with regex

sample raw data

A flaw in the way Internet Explorer handles a specific HTTP request could allow arbitrary code to execute in the context of the logged-on user, should the
<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>

Tags (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rather than extract everything *except* the tags, why not remove the tags and keep what's left?

| rex mode=sed "s/\<[^\>]+>//g"
---
If this reply helps you, Karma would be appreciated.

splunkerninja1
Explorer

@richgalloway . This expression is not removing the tags from the raw data

 

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex mode=sed "s/\<[^\>]+>//g"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command defaults to the _raw field.  Other fields must be explicitly referenced.  The following works in my sandbox.

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex field=message mode=sed "s/\<[^\>]+>//g"

 

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...