Splunk Cloud Platform

REGEX TO CAPTURE EVERYTHING EXCEPT HTML TAGS(</> <P>)

splunkerninja1
Explorer

I need to capture everything except the html tags like </a> <a> </p> </b>. These tags may appear anywhere in the raw data.

I was able to come up with regex that matches non capturing group (?:<\/?\w>) but I am stuck with not able to capture the rest everything in raw data.

 

Sample:

 

 Explorer is a web-browser developed by Microsoft which is included in Microsoft Windows Operating Systems.<P>
Microsoft has released Cumulative Security Updates for Internet Explorer which addresses various vulnerabilities found in Internet Explorer 8 (IE 8), Internet Explorer 9 (IE 9), Internet Explorer 10 (IE 10) and Internet Explorer 11 (IE 11). <P>

KB Articles associated with the Update:<P>
1) 4908777<BR>
2) 879586<BR>
3) 9088783<BR>
4) 789792<BR>
5) 0973782<BR>
6) 098781<BR>
7) 1234788<BR>
8) 8907799<BR><BR>

Please Note - CVE-2020-9090 required extra steps to be manually applied for being fully patched. Please refer to the FAQ seciton for <A HREF='https://portal.mtyb.windows.com/en-PK/WINDOWS-guidance/advisory/CVE-2020-9090 ' TARGET='_blank'>CVE-2020-9090 .</A><P>

QID Detection Logic (Authenticated):<BR>

Additionally the QID checks if the required Registry Keys are enabled to fully patch  <A HREF='https://portal.msrc.windows.com/en-US/guidance/advisory/CVE-2014-82789' TARGET='_blank'>CVE-2014-2897.</A> (See FAQ Section) <BR>

The keys to be patched are: <BR>
&quot;whkl\SOFTWARE\Microsoft\Internet Explorer\Main\FEATURE_ENABLE_PASTE_INFO_DISCLOSURE_FIX&quot; value &quot;iexplore.exe&quot; set to &quot;1&quot;.<BR>
Tags (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| rex field=_raw mode=sed "s/<\/?\w+.*?\/?>//g"

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex field=_raw mode=sed "s/<\/?\w+.*?\/?>//g"
0 Karma

splunkerninja1
Explorer

@ITWhisperer Thanks to you. I have an issue I need to use the same regex on two different fields butit throws an error when i run the below query 

| inputlookup remediation.csv 
| stats count by knowbe4, solution 
| rex field=knowbe4 mode=sed "s/<\/?\w+.*?\/?>//g" rex field=solution mode=sed "s/<\/?\w+.*?\/?>//g"

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You need to use two commands

| inputlookup remediation.csv 
| stats count by knowbe4, solution 
| rex field=knowbe4 mode=sed "s/<\/?\w+.*?\/?>//g"
| rex field=solution mode=sed "s/<\/?\w+.*?\/?>//g"
Get Updates on the Splunk Community!

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...