Splunk Search

REX SED Help, need to replace namespaces from xml field

somesoni2
Revered Legend

Hi,

I have a xml field which holds values like below. It contains namespaces for each element which I want to remove:

...message="<h:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<h:Header>
<h:creationTimestamp>2013-12-09T16:58:57.2450018+05:30</h:creationTimestamp>
<h:applicationId>XYZ</h:applicationId>
<h:hostName>Myhost</h:hostName>
</h:Header>
</h:Envelope>"

Obviously there could be more/different namespace values ("h:" prefix) in the the logs, so can't hardcode the value to replace it. I believe REX SED would be the appropriate method for my requirement.

I am very new to Regex so not able to start with REX SED command. Could anyone provide me some direction/example how to go about it?

Tags (2)
1 Solution

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

View solution in original post

Cuyose
Builder

Another way you could do this at search time is to simply eval the XML portion with a regex replace

|rex field=_raw "(?i)request=(?.+)$"

|eval requestXML=replace(XML,"\w+:","")

|xmlkv requestXML

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

emechler_splunk
Splunk Employee
Splunk Employee

Sure, simply remove the "h:" from the capture group as such:
"s/(<)h:(\S+)\s+(xmlns:\S+)(>)/\1\2\4/g"

0 Karma

somesoni2
Revered Legend

It worked fine and removing all occurrance of xmlns where "<h:" is present. Is is possible to remove the "h:" also so that I will be left with just "" instead of ""

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

Sorry, looked like there was an extra backslash in the regex. Try the updated version above.

0 Karma

somesoni2
Revered Legend

I would like to do this during search time. Unfortunately the above rex command doesn't work for me (event with test data I provided).

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...