I have searched splunkbase for a complete example of how to build an XML as an input and split the XML up where it needs to be split for events. It appears there isn't a clean method for doing this in splunk. Every example has been a regular expression hack that just does not seem to work when I copy the example and replace the XML element names for the element names that contain my events that I would like to extract out of my XML file.
The best practice http://wiki.splunk.com/Deploy:HowToWorkWithXMLLogFiles does not appear to work when I change the tag in the example to be my tag.
Why can't I find better examples of how to have XML as source inputs for splunk? It seems to build any kind of input I have to be a splunk expert.
For example in XML there are outputs that have no namespaces.
.... .....
There there are outputs that do have namespaces.
.... .....
I am not able to derive regular expressions and figure out the right line break or line break before and only before with the right combination of line merge with max events allowed but inject this made up line break to keep the magic parsing engine happy.
Where is the xpath type syntax that would let me say. splunk if you execute this regular expression each result will be an individual event that you can use?
end rant...
How does one define in the props.conf a way to input XML in splunk?
-------- more details
Let's say I have files generated by an application containing XML format logs. Each file contains a collection of events that should be processed as individual events by splunk.
For example
<!-- log data .... -->
<!-- log data .... -->
<!-- log data .... -->
<!-- log data .... -->
<!-- log data .... -->
Each EventLogItem should be an event in splunk.
Where XML differs from flat files is that the whitespace should be ignored between the XML elements. The file may or may not have pretty formatted XML.
<!-- log data .... --> <!-- log data .... --> <!-- log data .... --> <!-- log data .... --> <!-- log data .... -->
The XML examples also do not account for cases where the XML might be prefixed instead of declared under the default namespace.
Example.
This is where splunk's use of regular expressions to parse XML become difficult. Regular expressions are not always the easiest form of parsing that everyone is able to pick up and use quickly.
The regular expression for XML should not assume XML tags begin at the start of a line.
... View more