Splunk Cloud Platform

regex to capture the html tags in the raw data

Splunkerninja
Path Finder

Hi,

I have html tags like <p> <br> <a href="www.google/com target=_blank"> & so on in my raw data, I want to capture everything except these html tags . Please help me with regex

sample raw data

A flaw in the way Internet Explorer handles a specific HTTP request could allow arbitrary code to execute in the context of the logged-on user, should the
<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>

Tags (4)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rather than extract everything *except* the tags, why not remove the tags and keep what's left?

| rex mode=sed "s/\<[^\>]+>//g"
---
If this reply helps you, Karma would be appreciated.

splunkerninja1
Explorer

@richgalloway . This expression is not removing the tags from the raw data

 

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex mode=sed "s/\<[^\>]+>//g"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The rex command defaults to the _raw field.  Other fields must be explicitly referenced.  The following works in my sandbox.

| makeresults 
| eval message="<UL>
<LI>
The first vulnerability occurs because Internet Explorer does not correctly determine an obr in a pop-up window.</LI>
<LI>
The t type that is returned from a Web server during XML data binding.</LI>
</UL>
<P>
&quot;Location: URL:ms-its:C:WINDOWSHelpiexplore.::/itsrt.htm&quot;
<P>
:<P><A HREF='http://blogs.msdn.com/embres/archive/20/81.aspx' TARGET='_blank'>October Security Updates are (finally) available!</A><BR>" 
| rex field=message mode=sed "s/\<[^\>]+>//g"

 

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

What's New in Splunk Cloud Platform 9.3.2411?

Hey Splunky People! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2411. This release ...

Buttercup Games: Further Dashboarding Techniques (Part 6)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Hey, Splunk Community! Ready to take your data management skills to the next level? Join us for a 3-part ...