Splunk Cloud Platform

regex to capture the html tags in the raw data

Splunkerninja
Path Finder

Hi,

I have html tags like <p> <br> <a href="www.google/com target=_blank"> & so on in my raw data, I want to capture everything except these html tags . Please help me with regex

sample raw data

A flaw in the way Internet Explorer handles a specific HTTP request could allow arbitrary code to execute in the context of the logged-on user, should the
<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>

Tags (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rather than extract everything *except* the tags, why not remove the tags and keep what's left?

| rex mode=sed "s/\<[^\>]+>//g"
---
If this reply helps you, Karma would be appreciated.

splunkerninja1
Explorer

@richgalloway . This expression is not removing the tags from the raw data

 

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex mode=sed "s/\<[^\>]+>//g"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command defaults to the _raw field.  Other fields must be explicitly referenced.  The following works in my sandbox.

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex field=message mode=sed "s/\<[^\>]+>//g"

 

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...