Splunk Search

Remove every occurrence of pattern from _raw event

tommasoscarpa1
Path Finder

Hi,

 

I would like to remove every occurrence of a specific pattern from my _raw events.

Specifically in this case I am looking for deleting these html tags: <b>, </b>, <br>

 

Example, I have this raw event:

<b>This<\b> is an <b>example<\b><br>of raw<br>event

And I would like to transform it like this:

This is an exampleof rawevent

 

I tried to create this transforms.conf:

[remove_html_tags]
REGEX = <\/?br?>
FORMAT = 
DEST_KEY = _raw
 
And this props.conf:
[_sourcetype_]
TRANSFORMS-html_tags = remove_html_tags

But it doesn't work.
 
I also thought I could change the transforms.conf like this:
[remove_html_tags]
REGEX = (.*)<\/?br?>(.*)
FORMAT = $1$2
DEST_KEY = _raw

But it will stop after just one substitution and the REPEAT_MATCH property is not suitable because the doc says:
NOTE: This setting is only valid for index-time field extractions.
  This setting is ignored if DEST_KEY is _raw.

And I must set DEST_KEY = _raw

 

 

Can you help me?

Thank you in advance.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @tommasoscarpa1,

if you remove the XML tags, how can you recognize fields?

maybe you could use INDEXED_EXTRACTIONS = XML in your sourcetype definition having all the field extracted.

Ciao.

Giuseppe

0 Karma

tommasoscarpa1
Path Finder

Hi Giuseppe,

I am not talking about XML tags, but HTML tags. HTML tags are used to format the text and do not give any information about fields. Text between <b> and </b> will be formatted in bold and <br> is a line break.

I would like to remove these unnecessary characters from my inputs.

 

Ciao!
Tommaso

0 Karma
Get Updates on the Splunk Community!

Celebrating Fast Lane: 2025 Authorized Learning Partner of the Year

At .conf25, Splunk proudly recognized Fast Lane as the 2025 Authorized Learning Partner of the Year. This ...

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...