Getting Data In

XML File with namespaces parsing

somesoni2
Revered Legend

i All,

I have a log which as events as xml with namespace/xsl. Example log

<soap:Envelope xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ 
xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Header>
<requestheader:RequestHeader>
<requestheader:SendingTimeStamp>2013-11-07T17:50:07-05:00</requestheader:SendingTimeStamp>
</requestheader:RequestHeader>
<soap:Body>
<audit:BroadcastAudit version="1.1">
<xcs:AuditInfo>
<xcs:MessageDate>20131107</xcs:MessageDate>
<xcs:MessageTime>175007-05:00</xcs:MessageTime>
<xcs:DestSys>XXX</xcs:DestSys>
<xcs:Message><****this is also some xml******></xcs:Message>
</xcs:AuditInfo></audit:BroadcastAudit></soap:Body></soap:Envelope>

I am able to index the same as proper timestamp recognition.
What I want to do is to extract the fields automatically from the tags like DeskSys, MessageTime, MessageDate and also fields from Message which is again an xml.
I tried with KV_MODE = xml in props.conf and the fields I am getting are having namespace also associated with each (e.g. soap:Envelop:requestheader:SendintTimestamp= 2013-11-07T17:50:07-05:00).

Is there any way to get the fields, automatically, without any namespace/xsl?
Appreciate your help.

Tags (2)
0 Karma

rojyates
Explorer

Here's a more generic approach:
The following refinement of @martinh3's approach will remove all namespace prefixes (leaving the namespace declarations, which will simply do nothing) in one hit:

rex field=_raw mode=sed "s/(<\/?)([\w\d-]+):(\w+)([ \/>])/\1\3\4/g"

This will remove all namespace prefixes made up of word characters, numbers or "-".

If you are simply applying this to the whole raw message, then you can actually leave out 'field=_raw' or if you have extracted your XML into a field as part of a search, the replace 'field=_raw' with 'field=yourfieldname'.

0 Karma

martinh3
New Member

Might not be the correct way, but the only way I found to do it is by deleting the namespaces. I had a few different ones in my file, so I needed 3 different "sed" statements to remove each. Like:

... | rex mode=sed "s/namespace1://g" | rex "begin XML: (?.*)" ...

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...