<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: XML File with namespaces parsing in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135054#M27802</link>
    <description>&lt;P&gt;Here's a more generic approach:&lt;BR /&gt;
The following refinement of &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/80003"&gt;@martinh3&lt;/a&gt;'s approach will remove all namespace prefixes (leaving the namespace declarations, which will simply do nothing) in one hit:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;rex field=_raw mode=sed "s/(&amp;lt;\/?)([\w\d-]+):(\w+)([ \/&amp;gt;])/\1\3\4/g"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This will remove all namespace prefixes made up of word characters, numbers or "-".&lt;/P&gt;

&lt;P&gt;If you are simply applying this to the whole raw message, then you can actually leave out 'field=_raw' or if you have extracted your XML into a field as part of a search, the replace 'field=_raw' with 'field=yourfieldname'.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 20:47:51 GMT</pubDate>
    <dc:creator>rojyates</dc:creator>
    <dc:date>2020-09-29T20:47:51Z</dc:date>
    <item>
      <title>XML File with namespaces parsing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135052#M27800</link>
      <description>&lt;P&gt;i All,&lt;/P&gt;

&lt;P&gt;I have a log which as events as xml with namespace/xsl. Example log&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;soap:Envelope xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ 
xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&amp;gt;
&amp;lt;soap:Header&amp;gt;
&amp;lt;requestheader:RequestHeader&amp;gt;
&amp;lt;requestheader:SendingTimeStamp&amp;gt;2013-11-07T17:50:07-05:00&amp;lt;/requestheader:SendingTimeStamp&amp;gt;
&amp;lt;/requestheader:RequestHeader&amp;gt;
&amp;lt;soap:Body&amp;gt;
&amp;lt;audit:BroadcastAudit version="1.1"&amp;gt;
&amp;lt;xcs:AuditInfo&amp;gt;
&amp;lt;xcs:MessageDate&amp;gt;20131107&amp;lt;/xcs:MessageDate&amp;gt;
&amp;lt;xcs:MessageTime&amp;gt;175007-05:00&amp;lt;/xcs:MessageTime&amp;gt;
&amp;lt;xcs:DestSys&amp;gt;XXX&amp;lt;/xcs:DestSys&amp;gt;
&amp;lt;xcs:Message&amp;gt;&amp;lt;****this is also some xml******&amp;gt;&amp;lt;/xcs:Message&amp;gt;
&amp;lt;/xcs:AuditInfo&amp;gt;&amp;lt;/audit:BroadcastAudit&amp;gt;&amp;lt;/soap:Body&amp;gt;&amp;lt;/soap:Envelope&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I am able to index the same as proper timestamp recognition. &lt;BR /&gt;
What I want to do is to extract the fields automatically from the tags like DeskSys, MessageTime, MessageDate and also fields from Message which is again an xml. &lt;BR /&gt;
I tried with KV_MODE = xml in props.conf and the fields I am getting are having namespace also associated with each (e.g. soap:Envelop:requestheader:SendintTimestamp=  2013-11-07T17:50:07-05:00).&lt;/P&gt;

&lt;P&gt;Is there any way to get the fields, automatically, without any namespace/xsl?&lt;BR /&gt;
Appreciate your help.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Nov 2013 19:12:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135052#M27800</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2013-11-08T19:12:37Z</dc:date>
    </item>
    <item>
      <title>Re: XML File with namespaces parsing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135053#M27801</link>
      <description>&lt;P&gt;Might not be the correct way, but the only way I found to do it is by deleting the namespaces. I had a few different ones in my file, so I needed 3 different "sed" statements to remove each. Like: &lt;/P&gt;

&lt;P&gt;... | rex mode=sed "s/namespace1://g" | rex "begin XML: (?.*)" ...&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2015 23:34:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135053#M27801</guid>
      <dc:creator>martinh3</dc:creator>
      <dc:date>2015-12-04T23:34:13Z</dc:date>
    </item>
    <item>
      <title>Re: XML File with namespaces parsing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135054#M27802</link>
      <description>&lt;P&gt;Here's a more generic approach:&lt;BR /&gt;
The following refinement of &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/80003"&gt;@martinh3&lt;/a&gt;'s approach will remove all namespace prefixes (leaving the namespace declarations, which will simply do nothing) in one hit:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;rex field=_raw mode=sed "s/(&amp;lt;\/?)([\w\d-]+):(\w+)([ \/&amp;gt;])/\1\3\4/g"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This will remove all namespace prefixes made up of word characters, numbers or "-".&lt;/P&gt;

&lt;P&gt;If you are simply applying this to the whole raw message, then you can actually leave out 'field=_raw' or if you have extracted your XML into a field as part of a search, the replace 'field=_raw' with 'field=yourfieldname'.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 20:47:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/XML-File-with-namespaces-parsing/m-p/135054#M27802</guid>
      <dc:creator>rojyates</dc:creator>
      <dc:date>2020-09-29T20:47:51Z</dc:date>
    </item>
  </channel>
</rss>

