Hello,
Could someone please help me with removing the HTML tags from fields.
The data is a few sentences, such as remediation of a Microsoft patch, but contains links within.
This data is coming in through a lookup that I can't modify apparently. I'd like to get rid of the
etc tags so I can just display the text in a clear format.
Thank you!
@ndsouza25
I worked on your second request which was a bit more difficult. But I managed to get it.
https://regex101.com/r/q5fPca/3
The SPL command would look something like this:
yourbasesearch | rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^'"]+['"]|['"][^<]+<[^>]+>)//g"
you can change the field=_raw to another field name if you have already extracted this text into another field (optional)
@ndsouza25
I worked on your second request which was a bit more difficult. But I managed to get it.
https://regex101.com/r/q5fPca/3
The SPL command would look something like this:
yourbasesearch | rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^'"]+['"]|['"][^<]+<[^>]+>)//g"
you can change the field=_raw to another field name if you have already extracted this text into another field (optional)
unfortunately I had to delete the "KB43... " as well... as it would stick to the URL. Therefore making the URL invalid.
If you really need that "KB43..." value then hit me up again.
Thank you for spending the time! I don't need the KB values, but when I put the SPL command in, I get this error: Mismatched ']'. I see that the regex works, but can't figure out why Splunk complains about it.
@ndsouza25 you are right, I fixed it 🙂
| rex mode=sed field=_raw "s/((?=<[^>]>)[^;]+;[^;]+;|<[^>]>|<\/[^>]+>|<[^\'\"]+[\'\"]|[\'\"][^<]+<[^>]+>)//g"
The problem was that I needed to escape "
characters, as they interfere with the engine 🙂
Works now, tested it in splunk.
It works perfectly, thank you very much pyro_wood!
I wrote a regex that can at least get you the raw text in the format you wanted (without the hyperlinks actually working)
yourbasesearch | rex mode=sed field=_raw "s/(<[^>]+>|(?<=P>)(?:[^;]+;)+)//g"
The result should look like this afterwards:
Customers are advised to follow KB4343902 for instructions pertaining to the remediation of these vulnerabilities. Following are links for downloading patches to fix the vulnerabilities: ADV180020
Thank you very much! This works great! Is it possible to still display the URL. I don't need it to work as a hyperlink, but just show up so someone can copy and paste it into a browser. I really appreciate the help and quick response!
please submit a sample of the data
Customers are advised to follow <A HREF='https://support.microsoft.com/en-ph/help/4343902/security-update-for-adobe-flash-player' TARGET='_blank'>KB4343902</A> for instructions pertaining to the remediation of these vulnerabilities.<P> <P>Patch:<br/> Following are links for downloading patches to fix the vulnerabilities: <P> <A HREF='https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/ADV180020' TARGET='_blank'>ADV180020</A>
Above is a sample of the data I get from our vulnerability system. I would like for it to read as such, but actually show the link URL instead of converting to a hyperlink:
Customers are advised to follow KB4343902 for instructions pertaining to the remediation of these vulnerabilities.
Patch: Following are links for downloading patches to fix the vulnerabilities:
When I read "removing the html tags from fields" I immediately thought about regular expressions.
Unfortunately you don't seem to want to remove them. You want to create a hyperlink.
I'm not sure if I can help you with that. Sorry. 😞
P.S.: I'm not even sure if that is possible at all.
Hi @ndsouza25 , as @marycordovacaa said please share some sample data... otherwise we won't be able to help you.