Splunk Search

REX SED Help, need to replace namespaces from xml field

somesoni2
Revered Legend

Hi,

I have a xml field which holds values like below. It contains namespaces for each element which I want to remove:

...message="<h:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<h:Header>
<h:creationTimestamp>2013-12-09T16:58:57.2450018+05:30</h:creationTimestamp>
<h:applicationId>XYZ</h:applicationId>
<h:hostName>Myhost</h:hostName>
</h:Header>
</h:Envelope>"

Obviously there could be more/different namespace values ("h:" prefix) in the the logs, so can't hardcode the value to replace it. I believe REX SED would be the appropriate method for my requirement.

I am very new to Regex so not able to start with REX SED command. Could anyone provide me some direction/example how to go about it?

Tags (2)
1 Solution

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

View solution in original post

Cuyose
Builder

Another way you could do this at search time is to simply eval the XML portion with a regex replace

|rex field=_raw "(?i)request=(?.+)$"

|eval requestXML=replace(XML,"\w+:","")

|xmlkv requestXML

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

emechler_splunk
Splunk Employee
Splunk Employee

Sure, simply remove the "h:" from the capture group as such:
"s/(<)h:(\S+)\s+(xmlns:\S+)(>)/\1\2\4/g"

0 Karma

somesoni2
Revered Legend

It worked fine and removing all occurrance of xmlns where "<h:" is present. Is is possible to remove the "h:" also so that I will be left with just "" instead of ""

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

Sorry, looked like there was an extra backslash in the regex. Try the updated version above.

0 Karma

somesoni2
Revered Legend

I would like to do this during search time. Unfortunately the above rex command doesn't work for me (event with test data I provided).

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  &#x1f680; Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...