<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Need help to parse &amp; flatten XML Attribute data in nested format. in Dashboards &amp; Visualizations</title>
    <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423948#M27914</link>
    <description>&lt;P&gt;@niketnilay how do you add new python modules to the Splunk Python? &lt;/P&gt;

&lt;P&gt;My idea is to convert the XML to JSON using python, in the same python input script that we have scheduled to import the data. &lt;/P&gt;

&lt;P&gt;The inbuilt Splunk python might not have XML-JSON conversion modules and we might need to import/install those specific modules to do this in the script. &lt;/P&gt;</description>
    <pubDate>Fri, 24 Aug 2018 05:59:22 GMT</pubDate>
    <dc:creator>anirbandasdeb</dc:creator>
    <dc:date>2018-08-24T05:59:22Z</dc:date>
    <item>
      <title>Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423943#M27909</link>
      <description>&lt;P&gt;We have data coming in XML in the following format: &lt;/P&gt;

&lt;P&gt;Sample Event 1:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;Machine ServerID="ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8" MachineID="3D3F7ECE-D01E-98A7-4F05-871905F2AB37" Prefix="3D" Name="TRAY PACKER NO.4"&amp;gt;
    &amp;lt;Data Version="5.0.6392.4047"&amp;gt;
        &amp;lt;Event Start="08/22/2018 11:00:00.000" End="08/22/2018 11:56:55.310" Job="4009396" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1"&amp;gt;
            &amp;lt;Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" /&amp;gt;
            &amp;lt;Status Name="Starved" StatusCount="28" Duration="125748" /&amp;gt;
        &amp;lt;/Event&amp;gt;        
    &amp;lt;/Data&amp;gt;
&amp;lt;/Machine&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Sample Event 2:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;Machine ServerID="ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8" MachineID="3D3F7ECE-D01E-98A7-4F05-871905F2AB37" Prefix="3D" Name="TRAY PACKER NO.4"&amp;gt;
    &amp;lt;Data Version="5.0.6392.4047"&amp;gt;
        &amp;lt;Event Start="08/22/2018 11:00:00.000" End="08/22/2018 11:56:55.310" Job="4009396" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1"&amp;gt;
            &amp;lt;Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" /&amp;gt;
            &amp;lt;Status Name="Starved" StatusCount="28" Duration="125748" /&amp;gt;
        &amp;lt;/Event&amp;gt;
        &amp;lt;Event Start="08/22/2018 11:56:55.310" End="08/22/2018 12:56:55.310" Job="4009375" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1"&amp;gt;
            &amp;lt;Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" /&amp;gt;
            &amp;lt;Status Name="Starved" StatusCount="28" Duration="125748" /&amp;gt;
        &amp;lt;/Event&amp;gt;
    &amp;lt;/Data&amp;gt;
&amp;lt;/Machine&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Please note that the data is exclusively in XML &lt;STRONG&gt;attributes&lt;/STRONG&gt;, and not in elements. &lt;BR /&gt;
I am aware that we can possibly do it via Python pre-processing, but for now we need to flatten out the data using SPL. &lt;/P&gt;

&lt;P&gt;We have tried multiple combinations of &lt;STRONG&gt;spath&lt;/STRONG&gt; and &lt;STRONG&gt;mvexpand&lt;/STRONG&gt;. However, since data is in attribute tags, we cannot split it into separate rows to show in a table form, when it is of the form given in the second XML event. &lt;/P&gt;

&lt;P&gt;I am not sure we can handle this using a &lt;STRONG&gt;regex&lt;/STRONG&gt; since, apart from a few, the attributes are not uniform throughout. &lt;/P&gt;

&lt;P&gt;Can someone please help? &lt;/P&gt;

&lt;P&gt;Thanks in advance. &lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
Anirban. &lt;/P&gt;</description>
      <pubDate>Wed, 22 Aug 2018 13:48:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423943#M27909</guid>
      <dc:creator>anirbandasdeb</dc:creator>
      <dc:date>2018-08-22T13:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423944#M27910</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/188962"&gt;@ppablo&lt;/a&gt;_splunk , &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/130946"&gt;@mstjohn_splunk&lt;/a&gt; &lt;STRONG&gt;Thank You&lt;/STRONG&gt; for updating and correcting my English!! I was absolutely exhausted after a long &amp;amp; grueling day and didn't stop to check my grammar. &lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 21:01:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423944#M27910</guid>
      <dc:creator>anirbandasdeb</dc:creator>
      <dc:date>2020-09-29T21:01:07Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423945#M27911</link>
      <description>&lt;P&gt;@anirbandasdeb, if possible it is better if you get Null fields inserted as &lt;CODE&gt;NA&lt;/CODE&gt; or any other default values before data gets indexed in Splunk rather than doing this at Search Time. While using &lt;CODE&gt;spath&lt;/CODE&gt; command the DOM structure should remain the same. Also the XML structure is complex as meaningful data is available at different hierarchy (several interesting fields at different levels).&lt;/P&gt;

&lt;P&gt;However, to begin with you can try the following run anywhere search example based on Sample Data provided in the question. &lt;BR /&gt;
PS: &lt;BR /&gt;
1. I have replaced empty values for Production and Scrap with &lt;CODE&gt;NA&lt;/CODE&gt; using &lt;CODE&gt;eval&lt;/CODE&gt; on &lt;CODE&gt;_raw&lt;/CODE&gt; data, which will be an expensive query considering total events you will operate upon.&lt;BR /&gt;
2. I extracted interesting fields from two levels in the XML. i.e. &lt;CODE&gt;Machine.Data.Event node attributes&lt;/CODE&gt; and the &lt;CODE&gt;Event&lt;/CODE&gt; tree below it. If you need more data to be extracted, similar concept needs to be extended at each level i.e.&lt;BR /&gt;
    a. &lt;CODE&gt;mvzip()&lt;/CODE&gt; to stitch multi value fields together&lt;BR /&gt;
    b. &lt;CODE&gt;mvexpand&lt;/CODE&gt; to form separate events&lt;BR /&gt;
    c. &lt;CODE&gt;split()/makemv&lt;/CODE&gt; to split the values as fields again.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults
| eval _raw="&amp;lt;Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\"&amp;gt;
     &amp;lt;Data Version=\"5.0.6392.4047\"&amp;gt;
         &amp;lt;Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\"&amp;gt;
             &amp;lt;Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" /&amp;gt;
             &amp;lt;Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" /&amp;gt;
         &amp;lt;/Event&amp;gt;
         &amp;lt;Event Start=\"08/22/2018 11:56:55.310\" End=\"08/22/2018 12:56:55.310\" Job=\"4009375\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\"&amp;gt;
             &amp;lt;Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" /&amp;gt;
             &amp;lt;Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" /&amp;gt;
         &amp;lt;/Event&amp;gt;
     &amp;lt;/Data&amp;gt;
 &amp;lt;/Machine&amp;gt;"
| append [| makeresults
| eval _raw=" &amp;lt;Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\"&amp;gt;
     &amp;lt;Data Version=\"5.0.6392.4047\"&amp;gt;
         &amp;lt;Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\"&amp;gt;
             &amp;lt;Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" /&amp;gt;
             &amp;lt;Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" /&amp;gt;
         &amp;lt;/Event&amp;gt;        
     &amp;lt;/Data&amp;gt;
 &amp;lt;/Machine&amp;gt;"]
| eval _raw=replace(_raw,"(Duration=\"[^\"]+\")\s\/\&amp;gt;","\1 Production=\"NA\" Scrap=\"NA\" /&amp;gt;")
| spath path=Machine.Data.Event output=Event
| spath path=Machine.Data.Event{@Start} output=Start
| spath path=Machine.Data.Event{@End} output=End
| spath path=Machine.Data.Event{@Job} output=Job
| eval data=mvzip(Job,(mvzip(Start,mvzip(End,Event))))
| fields - _*, Start End Job Event
| mvexpand data
| makemv data delim=","
| eval Job=mvindex(data,0),Start=mvindex(data,1),End=mvindex(data,2),_raw=mvindex(data,3)
| fields - data
| spath
| rename "Status{@*}" as *
| eval data=mvzip(Name,(mvzip(Production,mvzip(Scrap,mvzip(StatusCount,Duration)))))
| fields - _*, Name Production Scrap StatusCount Duration
| mvexpand data
| makemv data delim=","
| eval Name=mvindex(data,0),Production=mvindex(data,1),Scrap=mvindex(data,2),StatusCount=mvindex(data,3),Duration=mvindex(data,4)
| fields - Data
| table Job Start End Name Production Scrap StatusCount Duration
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;PS: Since most of multi valued commands and eval on _raw data would be expensive to be performed at Search Time, you should consider following performance improvement methods:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Check feasibility of streamlining XML data at source system to have all attributes/field name present (with default value instead of no field name at all).&lt;/LI&gt;
&lt;LI&gt;Ideally if you Have Forwarder, see if Regular Expression based replace can be used to set default values in case fields/attributes are missing in XML data i.e. through REGEX &lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf"&gt;transforms&lt;/A&gt; or &lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata#Define_the_sed_script_in_props.conf"&gt;SEDCMD&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;If you have sufficient license, you can use &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect"&gt;collect&lt;/A&gt; to perform search similar to the one above to extract all required fields and index to a new index by scheduling the search query with collect command. Your actual Dashboard/Report query will run on the new index only. License cost will be based on whether you want default &lt;CODE&gt;sourcetype=stash&lt;/CODE&gt; to be overridden with your custom sourcetype or not. Read documentation for details.&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 23 Aug 2018 08:23:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423945#M27911</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-08-23T08:23:37Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423946#M27912</link>
      <description>&lt;P&gt;@anirbandasdeb Just tweaked a few things! No problem! It is a good question &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 15:59:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423946#M27912</guid>
      <dc:creator>mstjohn_splunk</dc:creator>
      <dc:date>2018-08-23T15:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423947#M27913</link>
      <description>&lt;P&gt;@niketnilay this is a lot!! &lt;BR /&gt;
Thanks for looking into this at depth. &lt;/P&gt;

&lt;P&gt;Regarding the source system, its a third party tool called Shoplogix, and data is pulled via web queries. &lt;BR /&gt;
Ideally I would have preferred JSON over XML, but for this result set, they have only XML response to the web query and we cannot streamline the response any further without Python processing. &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;The summary index idea sounds promising as well. &lt;/P&gt;

&lt;P&gt;I will test these out and come back with more updates. &lt;/P&gt;

&lt;P&gt;Thanks again!!&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 05:55:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423947#M27913</guid>
      <dc:creator>anirbandasdeb</dc:creator>
      <dc:date>2018-08-24T05:55:22Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423948#M27914</link>
      <description>&lt;P&gt;@niketnilay how do you add new python modules to the Splunk Python? &lt;/P&gt;

&lt;P&gt;My idea is to convert the XML to JSON using python, in the same python input script that we have scheduled to import the data. &lt;/P&gt;

&lt;P&gt;The inbuilt Splunk python might not have XML-JSON conversion modules and we might need to import/install those specific modules to do this in the script. &lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 05:59:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423948#M27914</guid>
      <dc:creator>anirbandasdeb</dc:creator>
      <dc:date>2018-08-24T05:59:22Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423949#M27915</link>
      <description>&lt;P&gt;@anirbandasdeb I am not sure at which stage processing input XML data you want to integrate your custom Script. Before indexing to Splunk (Scripted Input) or During Search Time (when it does not matter whether the data is XML or JSON as spath command remains the same for both).&lt;/P&gt;

&lt;P&gt;If you are configuring your own Scripted input (You should eventually consider &lt;A href="https://splunkbase.splunk.com/app/2962/"&gt;Splunk Add On Builder&lt;/A&gt; once the script is fully tested). You can refer to the following Splunk Answers with some leads: &lt;A href="https://answers.splunk.com/answers/612626/integrating-splunk-dashboard-with-python.html"&gt;https://answers.splunk.com/answers/612626/integrating-splunk-dashboard-with-python.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Do try out and confirm. Also Accept Answer/ Up Vote comments that help once this works for you.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 06:52:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423949#M27915</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-08-24T06:52:42Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423950#M27916</link>
      <description>&lt;P&gt;@niketnilay &lt;/P&gt;

&lt;P&gt;How do I import libraries into the script??&lt;/P&gt;

&lt;P&gt;Ex.: &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.1.2/AdvancedDev/ScriptExample#Script_example.2C_poll_a_database_.28Python.29"&gt;https://docs.splunk.com/Documentation/Splunk/7.1.2/AdvancedDev/ScriptExample#Script_example.2C_poll_a_database_.28Python.29&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Here they are importing the _mssql library. If this library is not native to the Splunk Python, how do I install it? Can I just put the library in the same folder as the script? &lt;/P&gt;

&lt;P&gt;I have not used the Add-On Builder yet, hence ignorant of the options it provides to deploy scripts. &lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 03:39:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423950#M27916</guid>
      <dc:creator>anirbandasdeb</dc:creator>
      <dc:date>2018-08-27T03:39:44Z</dc:date>
    </item>
    <item>
      <title>Re: Need help to parse &amp; flatten XML Attribute data in nested format.</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423951#M27917</link>
      <description>&lt;P&gt;Yes you can try that.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Aug 2018 04:56:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Need-help-to-parse-flatten-XML-Attribute-data-in-nested-format/m-p/423951#M27917</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-08-28T04:56:38Z</dc:date>
    </item>
  </channel>
</rss>

