Splunk Search

REX SED Help, need to replace namespaces from xml field

somesoni2
Revered Legend

Hi,

I have a xml field which holds values like below. It contains namespaces for each element which I want to remove:

...message="<h:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<h:Header>
<h:creationTimestamp>2013-12-09T16:58:57.2450018+05:30</h:creationTimestamp>
<h:applicationId>XYZ</h:applicationId>
<h:hostName>Myhost</h:hostName>
</h:Header>
</h:Envelope>"

Obviously there could be more/different namespace values ("h:" prefix) in the the logs, so can't hardcode the value to replace it. I believe REX SED would be the appropriate method for my requirement.

I am very new to Regex so not able to start with REX SED command. Could anyone provide me some direction/example how to go about it?

Tags (2)
1 Solution

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

View solution in original post

Cuyose
Builder

Another way you could do this at search time is to simply eval the XML portion with a regex replace

|rex field=_raw "(?i)request=(?.+)$"

|eval requestXML=replace(XML,"\w+:","")

|xmlkv requestXML

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

emechler_splunk
Splunk Employee
Splunk Employee

Sure, simply remove the "h:" from the capture group as such:
"s/(<)h:(\S+)\s+(xmlns:\S+)(>)/\1\2\4/g"

0 Karma

somesoni2
Revered Legend

It worked fine and removing all occurrance of xmlns where "<h:" is present. Is is possible to remove the "h:" also so that I will be left with just "" instead of ""

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

Sorry, looked like there was an extra backslash in the regex. Try the updated version above.

0 Karma

somesoni2
Revered Legend

I would like to do this during search time. Unfortunately the above rex command doesn't work for me (event with test data I provided).

0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...