Splunk Cloud Platform

REGEX TO CAPTURE EVERYTHING EXCEPT HTML TAGS(</> <P>)

splunkerninja1
Explorer

I need to capture everything except the html tags like </a> <a> </p> </b>. These tags may appear anywhere in the raw data.

I was able to come up with regex that matches non capturing group (?:<\/?\w>) but I am stuck with not able to capture the rest everything in raw data.

 

Sample:

 

 Explorer is a web-browser developed by Microsoft which is included in Microsoft Windows Operating Systems.<P>
Microsoft has released Cumulative Security Updates for Internet Explorer which addresses various vulnerabilities found in Internet Explorer 8 (IE 8), Internet Explorer 9 (IE 9), Internet Explorer 10 (IE 10) and Internet Explorer 11 (IE 11). <P>

KB Articles associated with the Update:<P>
1) 4908777<BR>
2) 879586<BR>
3) 9088783<BR>
4) 789792<BR>
5) 0973782<BR>
6) 098781<BR>
7) 1234788<BR>
8) 8907799<BR><BR>

Please Note - CVE-2020-9090 required extra steps to be manually applied for being fully patched. Please refer to the FAQ seciton for <A HREF='https://portal.mtyb.windows.com/en-PK/WINDOWS-guidance/advisory/CVE-2020-9090 ' TARGET='_blank'>CVE-2020-9090 .</A><P>

QID Detection Logic (Authenticated):<BR>

Additionally the QID checks if the required Registry Keys are enabled to fully patch  <A HREF='https://portal.msrc.windows.com/en-US/guidance/advisory/CVE-2014-82789' TARGET='_blank'>CVE-2014-2897.</A> (See FAQ Section) <BR>

The keys to be patched are: <BR>
&quot;whkl\SOFTWARE\Microsoft\Internet Explorer\Main\FEATURE_ENABLE_PASTE_INFO_DISCLOSURE_FIX&quot; value &quot;iexplore.exe&quot; set to &quot;1&quot;.<BR>
Tags (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| rex field=_raw mode=sed "s/<\/?\w+.*?\/?>//g"

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex field=_raw mode=sed "s/<\/?\w+.*?\/?>//g"
0 Karma

splunkerninja1
Explorer

@ITWhisperer Thanks to you. I have an issue I need to use the same regex on two different fields butit throws an error when i run the below query 

| inputlookup remediation.csv 
| stats count by knowbe4, solution 
| rex field=knowbe4 mode=sed "s/<\/?\w+.*?\/?>//g" rex field=solution mode=sed "s/<\/?\w+.*?\/?>//g"

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You need to use two commands

| inputlookup remediation.csv 
| stats count by knowbe4, solution 
| rex field=knowbe4 mode=sed "s/<\/?\w+.*?\/?>//g"
| rex field=solution mode=sed "s/<\/?\w+.*?\/?>//g"
Get Updates on the Splunk Community!

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  &#x1f680; Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...