Splunk Search

REX SED Help, need to replace namespaces from xml field

somesoni2
Revered Legend

Hi,

I have a xml field which holds values like below. It contains namespaces for each element which I want to remove:

...message="<h:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<h:Header>
<h:creationTimestamp>2013-12-09T16:58:57.2450018+05:30</h:creationTimestamp>
<h:applicationId>XYZ</h:applicationId>
<h:hostName>Myhost</h:hostName>
</h:Header>
</h:Envelope>"

Obviously there could be more/different namespace values ("h:" prefix) in the the logs, so can't hardcode the value to replace it. I believe REX SED would be the appropriate method for my requirement.

I am very new to Regex so not able to start with REX SED command. Could anyone provide me some direction/example how to go about it?

Tags (2)
1 Solution

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

View solution in original post

Cuyose
Builder

Another way you could do this at search time is to simply eval the XML portion with a regex replace

|rex field=_raw "(?i)request=(?.+)$"

|eval requestXML=replace(XML,"\w+:","")

|xmlkv requestXML

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

emechler_splunk
Splunk Employee
Splunk Employee

Sure, simply remove the "h:" from the capture group as such:
"s/(<)h:(\S+)\s+(xmlns:\S+)(>)/\1\2\4/g"

0 Karma

somesoni2
Revered Legend

It worked fine and removing all occurrance of xmlns where "<h:" is present. Is is possible to remove the "h:" also so that I will be left with just "" instead of ""

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

Sorry, looked like there was an extra backslash in the regex. Try the updated version above.

0 Karma

somesoni2
Revered Legend

I would like to do this during search time. Unfortunately the above rex command doesn't work for me (event with test data I provided).

0 Karma
Get Updates on the Splunk Community!

Splunk Security Content for Threat Detection & Response, Q1 Roundup

Join Principal Threat Researcher, Michael Haag, as he walks through:An introduction to the Splunk Threat ...

Splunk Life | Happy Pride Month!

Happy Pride Month, Splunk Community! &#x1f308; In the United States, as well as many countries around the ...

SplunkTrust | Where Are They Now - Michael Uschmann

The Background Five years ago, Splunk published several videos showcasing members of the SplunkTrust to share ...