- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

We have data coming in XML in the following format:
Sample Event 1:
<Machine ServerID="ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8" MachineID="3D3F7ECE-D01E-98A7-4F05-871905F2AB37" Prefix="3D" Name="TRAY PACKER NO.4">
<Data Version="5.0.6392.4047">
<Event Start="08/22/2018 11:00:00.000" End="08/22/2018 11:56:55.310" Job="4009396" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1">
<Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" />
<Status Name="Starved" StatusCount="28" Duration="125748" />
</Event>
</Data>
</Machine>
Sample Event 2:
<Machine ServerID="ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8" MachineID="3D3F7ECE-D01E-98A7-4F05-871905F2AB37" Prefix="3D" Name="TRAY PACKER NO.4">
<Data Version="5.0.6392.4047">
<Event Start="08/22/2018 11:00:00.000" End="08/22/2018 11:56:55.310" Job="4009396" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1">
<Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" />
<Status Name="Starved" StatusCount="28" Duration="125748" />
</Event>
<Event Start="08/22/2018 11:56:55.310" End="08/22/2018 12:56:55.310" Job="4009375" Instance="7c7e630aab6b4ed2833920037203ec17" Shift="Shift 1">
<Status Name="Running" StatusCount="28" Duration="3289562.2199" Production="973" Scrap="1" />
<Status Name="Starved" StatusCount="28" Duration="125748" />
</Event>
</Data>
</Machine>
Please note that the data is exclusively in XML attributes, and not in elements.
I am aware that we can possibly do it via Python pre-processing, but for now we need to flatten out the data using SPL.
We have tried multiple combinations of spath and mvexpand. However, since data is in attribute tags, we cannot split it into separate rows to show in a table form, when it is of the form given in the second XML event.
I am not sure we can handle this using a regex since, apart from a few, the attributes are not uniform throughout.
Can someone please help?
Thanks in advance.
Regards,
Anirban.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@anirbandasdeb, if possible it is better if you get Null fields inserted as NA
or any other default values before data gets indexed in Splunk rather than doing this at Search Time. While using spath
command the DOM structure should remain the same. Also the XML structure is complex as meaningful data is available at different hierarchy (several interesting fields at different levels).
However, to begin with you can try the following run anywhere search example based on Sample Data provided in the question.
PS:
1. I have replaced empty values for Production and Scrap with NA
using eval
on _raw
data, which will be an expensive query considering total events you will operate upon.
2. I extracted interesting fields from two levels in the XML. i.e. Machine.Data.Event node attributes
and the Event
tree below it. If you need more data to be extracted, similar concept needs to be extended at each level i.e.
a. mvzip()
to stitch multi value fields together
b. mvexpand
to form separate events
c. split()/makemv
to split the values as fields again.
| makeresults
| eval _raw="<Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\">
<Data Version=\"5.0.6392.4047\">
<Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
<Event Start=\"08/22/2018 11:56:55.310\" End=\"08/22/2018 12:56:55.310\" Job=\"4009375\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
</Data>
</Machine>"
| append [| makeresults
| eval _raw=" <Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\">
<Data Version=\"5.0.6392.4047\">
<Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
</Data>
</Machine>"]
| eval _raw=replace(_raw,"(Duration=\"[^\"]+\")\s\/\>","\1 Production=\"NA\" Scrap=\"NA\" />")
| spath path=Machine.Data.Event output=Event
| spath path=Machine.Data.Event{@Start} output=Start
| spath path=Machine.Data.Event{@End} output=End
| spath path=Machine.Data.Event{@Job} output=Job
| eval data=mvzip(Job,(mvzip(Start,mvzip(End,Event))))
| fields - _*, Start End Job Event
| mvexpand data
| makemv data delim=","
| eval Job=mvindex(data,0),Start=mvindex(data,1),End=mvindex(data,2),_raw=mvindex(data,3)
| fields - data
| spath
| rename "Status{@*}" as *
| eval data=mvzip(Name,(mvzip(Production,mvzip(Scrap,mvzip(StatusCount,Duration)))))
| fields - _*, Name Production Scrap StatusCount Duration
| mvexpand data
| makemv data delim=","
| eval Name=mvindex(data,0),Production=mvindex(data,1),Scrap=mvindex(data,2),StatusCount=mvindex(data,3),Duration=mvindex(data,4)
| fields - Data
| table Job Start End Name Production Scrap StatusCount Duration
PS: Since most of multi valued commands and eval on _raw data would be expensive to be performed at Search Time, you should consider following performance improvement methods:
- Check feasibility of streamlining XML data at source system to have all attributes/field name present (with default value instead of no field name at all).
- Ideally if you Have Forwarder, see if Regular Expression based replace can be used to set default values in case fields/attributes are missing in XML data i.e. through REGEX transforms or SEDCMD
- If you have sufficient license, you can use collect to perform search similar to the one above to extract all required fields and index to a new index by scheduling the search query with collect command. Your actual Dashboard/Report query will run on the new index only. License cost will be based on whether you want default
sourcetype=stash
to be overridden with your custom sourcetype or not. Read documentation for details.
| makeresults | eval message= "Happy Splunking!!!"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@anirbandasdeb, if possible it is better if you get Null fields inserted as NA
or any other default values before data gets indexed in Splunk rather than doing this at Search Time. While using spath
command the DOM structure should remain the same. Also the XML structure is complex as meaningful data is available at different hierarchy (several interesting fields at different levels).
However, to begin with you can try the following run anywhere search example based on Sample Data provided in the question.
PS:
1. I have replaced empty values for Production and Scrap with NA
using eval
on _raw
data, which will be an expensive query considering total events you will operate upon.
2. I extracted interesting fields from two levels in the XML. i.e. Machine.Data.Event node attributes
and the Event
tree below it. If you need more data to be extracted, similar concept needs to be extended at each level i.e.
a. mvzip()
to stitch multi value fields together
b. mvexpand
to form separate events
c. split()/makemv
to split the values as fields again.
| makeresults
| eval _raw="<Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\">
<Data Version=\"5.0.6392.4047\">
<Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
<Event Start=\"08/22/2018 11:56:55.310\" End=\"08/22/2018 12:56:55.310\" Job=\"4009375\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
</Data>
</Machine>"
| append [| makeresults
| eval _raw=" <Machine ServerID=\"ArCRw3ep7S1GJTEl4ZNgwJHWLjSd446mG2lyLus7Ira8\" MachineID=\"3D3F7ECE-D01E-98A7-4F05-871905F2AB37\" Prefix=\"3D\" Name=\"TRAY PACKER NO.4\">
<Data Version=\"5.0.6392.4047\">
<Event Start=\"08/22/2018 11:00:00.000\" End=\"08/22/2018 11:56:55.310\" Job=\"4009396\" Instance=\"7c7e630aab6b4ed2833920037203ec17\" Shift=\"Shift 1\">
<Status Name=\"Running\" StatusCount=\"28\" Duration=\"3289562.2199\" Production=\"973\" Scrap=\"1\" />
<Status Name=\"Starved\" StatusCount=\"28\" Duration=\"125748\" />
</Event>
</Data>
</Machine>"]
| eval _raw=replace(_raw,"(Duration=\"[^\"]+\")\s\/\>","\1 Production=\"NA\" Scrap=\"NA\" />")
| spath path=Machine.Data.Event output=Event
| spath path=Machine.Data.Event{@Start} output=Start
| spath path=Machine.Data.Event{@End} output=End
| spath path=Machine.Data.Event{@Job} output=Job
| eval data=mvzip(Job,(mvzip(Start,mvzip(End,Event))))
| fields - _*, Start End Job Event
| mvexpand data
| makemv data delim=","
| eval Job=mvindex(data,0),Start=mvindex(data,1),End=mvindex(data,2),_raw=mvindex(data,3)
| fields - data
| spath
| rename "Status{@*}" as *
| eval data=mvzip(Name,(mvzip(Production,mvzip(Scrap,mvzip(StatusCount,Duration)))))
| fields - _*, Name Production Scrap StatusCount Duration
| mvexpand data
| makemv data delim=","
| eval Name=mvindex(data,0),Production=mvindex(data,1),Scrap=mvindex(data,2),StatusCount=mvindex(data,3),Duration=mvindex(data,4)
| fields - Data
| table Job Start End Name Production Scrap StatusCount Duration
PS: Since most of multi valued commands and eval on _raw data would be expensive to be performed at Search Time, you should consider following performance improvement methods:
- Check feasibility of streamlining XML data at source system to have all attributes/field name present (with default value instead of no field name at all).
- Ideally if you Have Forwarder, see if Regular Expression based replace can be used to set default values in case fields/attributes are missing in XML data i.e. through REGEX transforms or SEDCMD
- If you have sufficient license, you can use collect to perform search similar to the one above to extract all required fields and index to a new index by scheduling the search query with collect command. Your actual Dashboard/Report query will run on the new index only. License cost will be based on whether you want default
sourcetype=stash
to be overridden with your custom sourcetype or not. Read documentation for details.
| makeresults | eval message= "Happy Splunking!!!"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@niketnilay this is a lot!!
Thanks for looking into this at depth.
Regarding the source system, its a third party tool called Shoplogix, and data is pulled via web queries.
Ideally I would have preferred JSON over XML, but for this result set, they have only XML response to the web query and we cannot streamline the response any further without Python processing. 😞
The summary index idea sounds promising as well.
I will test these out and come back with more updates.
Thanks again!!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@niketnilay how do you add new python modules to the Splunk Python?
My idea is to convert the XML to JSON using python, in the same python input script that we have scheduled to import the data.
The inbuilt Splunk python might not have XML-JSON conversion modules and we might need to import/install those specific modules to do this in the script.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@anirbandasdeb I am not sure at which stage processing input XML data you want to integrate your custom Script. Before indexing to Splunk (Scripted Input) or During Search Time (when it does not matter whether the data is XML or JSON as spath command remains the same for both).
If you are configuring your own Scripted input (You should eventually consider Splunk Add On Builder once the script is fully tested). You can refer to the following Splunk Answers with some leads: https://answers.splunk.com/answers/612626/integrating-splunk-dashboard-with-python.html
Do try out and confirm. Also Accept Answer/ Up Vote comments that help once this works for you.
| makeresults | eval message= "Happy Splunking!!!"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@niketnilay
How do I import libraries into the script??
Here they are importing the _mssql library. If this library is not native to the Splunk Python, how do I install it? Can I just put the library in the same folder as the script?
I have not used the Add-On Builder yet, hence ignorant of the options it provides to deploy scripts.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yes you can try that.
| makeresults | eval message= "Happy Splunking!!!"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@ppablo_splunk , @mstjohn_splunk Thank You for updating and correcting my English!! I was absolutely exhausted after a long & grueling day and didn't stop to check my grammar.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@anirbandasdeb Just tweaked a few things! No problem! It is a good question 🙂
