Splunk Search

Can someone help me with regex to remove HTML tags from fields?

ndsouza25
New Member

Hello,

Could someone please help me with removing the HTML tags from fields.

The data is a few sentences, such as remediation of a Microsoft patch, but contains links within.

This data is coming in through a lookup that I can't modify apparently. I'd like to get rid of the

etc tags so I can just display the text in a clear format.

Thank you!

0 Karma
1 Solution

horsefez
Motivator

@ndsouza25

I worked on your second request which was a bit more difficult. But I managed to get it.

https://regex101.com/r/q5fPca/3

The SPL command would look something like this:
yourbasesearch | rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^'"]+['"]|['"][^<]+<[^>]+>)//g"

you can change the field=_raw to another field name if you have already extracted this text into another field (optional)

View solution in original post

horsefez
Motivator

@ndsouza25

I worked on your second request which was a bit more difficult. But I managed to get it.

https://regex101.com/r/q5fPca/3

The SPL command would look something like this:
yourbasesearch | rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^'"]+['"]|['"][^<]+<[^>]+>)//g"

you can change the field=_raw to another field name if you have already extracted this text into another field (optional)

horsefez
Motivator

unfortunately I had to delete the "KB43... " as well... as it would stick to the URL. Therefore making the URL invalid.

If you really need that "KB43..." value then hit me up again.

0 Karma

ndsouza25
New Member

Thank you for spending the time! I don't need the KB values, but when I put the SPL command in, I get this error: Mismatched ']'. I see that the regex works, but can't figure out why Splunk complains about it.

0 Karma

horsefez
Motivator

@ndsouza25 you are right, I fixed it 🙂

| rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^\'\"]+[\'\"]|[\'\"][^<]+<[^>]+>)//g"

The problem was that I needed to escape " characters, as they interfere with the engine 🙂

Works now, tested it in splunk.

0 Karma

ndsouza25
New Member

It works perfectly, thank you very much pyro_wood!

0 Karma

horsefez
Motivator

I wrote a regex that can at least get you the raw text in the format you wanted (without the hyperlinks actually working)

yourbasesearch | rex mode=sed field=_raw "s/(<[^>]+>|(?<=P>)(?:[^;]+;)+)//g"

The result should look like this afterwards:
Customers are advised to follow KB4343902 for instructions pertaining to the remediation of these vulnerabilities. Following are links for downloading patches to fix the vulnerabilities: ADV180020

https://regex101.com/r/q5fPca/1

0 Karma

ndsouza25
New Member

Thank you very much! This works great! Is it possible to still display the URL. I don't need it to work as a hyperlink, but just show up so someone can copy and paste it into a browser. I really appreciate the help and quick response!

0 Karma

marycordova
SplunkTrust
SplunkTrust

please submit a sample of the data

@marycordova

ndsouza25
New Member
Customers are advised to follow <A HREF='https://support.microsoft.com/en-ph/help/4343902/security-update-for-adobe-flash-player' TARGET='_blank'>KB4343902</A> for instructions pertaining to the remediation of these vulnerabilities.<P> <P>Patch:&lt;br/&gt; Following are links for downloading patches to fix the vulnerabilities: <P> <A HREF='https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/ADV180020' TARGET='_blank'>ADV180020</A>
0 Karma

ndsouza25
New Member

Above is a sample of the data I get from our vulnerability system. I would like for it to read as such, but actually show the link URL instead of converting to a hyperlink:

Customers are advised to follow KB4343902 for instructions pertaining to the remediation of these vulnerabilities.

Patch: Following are links for downloading patches to fix the vulnerabilities:

ADV180020

0 Karma

horsefez
Motivator

When I read "removing the html tags from fields" I immediately thought about regular expressions.
Unfortunately you don't seem to want to remove them. You want to create a hyperlink.

I'm not sure if I can help you with that. Sorry. 😞

P.S.: I'm not even sure if that is possible at all.

0 Karma

horsefez
Motivator

Hi @ndsouza25 , as @marycordovacaa said please share some sample data... otherwise we won't be able to help you.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...