Getting Data In

How to configure XML data parsing?

Motivator

Hi,

I've not tried to parse XML data in Splunk so I need a bit of hand holding.... I have the following data that repeats for different sensors. I'd like to be able to extract all the XML fields. I know this needs to be done in props.conf (and maybe transforms.conf) but I don't know where to begin :).... Any help would be GREATLY appreciated! TY!!!

 <!--Crow Sensors Begin-->
    <DeviceDescriptor>
        <uuid>4434D720-A9E7-11E3-9CF2-0002A5D5C51B</uuid>
        <description>Flood Sensor</description>
        <category>zigbee</category>
        <manufacturer>Crow</manufacturer>
        <model>FLOOD-ZB</model>
        <hardwareVersions>0x1C</hardwareVersions>
        <firmwareVersions>0x01000025</firmwareVersions>
        <latestFirmware>
            <version>0x01000025</version>
            <filename>crow-flood-zb-v1.0.25.ota</filename>
            <type>ota</type>
        </latestFirmware>
    </DeviceDescriptor>

    <DeviceDescriptor>
        <uuid>4b931971-bf2a-11e3-b1b6-0800200c9a66</uuid>
        <description>Motion (PIR) Sensor</description>
        <category>zigbee</category>
        <manufacturer>Crow</manufacturer>
        <model>PIR-ZB</model>
        <hardwareVersions>0x1A</hardwareVersions>
        <firmwareVersions>0x01000025</firmwareVersions>
        <latestFirmware>
            <version>0x01000025</version>
            <filename>crow-pir-zb-v1.0.25.ota</filename>
            <type>ota</type>
        </latestFirmware>
    </DeviceDescriptor>
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Ok. So if the ingestion is done in such a way that each <DeviceDescriptor> is a separate event in Splunk with valid xml syntax, the field extraction is as simple as adding KV_MODE = xml in props.conf on Search Head(s). So, main focus should be getting the data ingested correctly. Since your data doesn't have timestamp, I'm using current time as the _time value for the event.

Try this for props.conf on your Indexer/Heavy Forwarder.

[YourSourceType]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\s*\<DeviceDescriptor\>)
DATETIME_CONFIG = CURRENT

Search head props.conf

[YourSourceType]
KV_MODE =xml

View solution in original post

SplunkTrust
SplunkTrust

Ok. So if the ingestion is done in such a way that each <DeviceDescriptor> is a separate event in Splunk with valid xml syntax, the field extraction is as simple as adding KV_MODE = xml in props.conf on Search Head(s). So, main focus should be getting the data ingested correctly. Since your data doesn't have timestamp, I'm using current time as the _time value for the event.

Try this for props.conf on your Indexer/Heavy Forwarder.

[YourSourceType]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\s*\<DeviceDescriptor\>)
DATETIME_CONFIG = CURRENT

Search head props.conf

[YourSourceType]
KV_MODE =xml

View solution in original post

Explorer

worked for me but had to modify the linebreaker myself. check your XML file for syntax problems as well.
xmlint --noout filename.xml ; echo $?
If there wasn't an error with your xml syntax, you should see result code 0 and nothing but a blank return.

0 Karma

Explorer

I am have a similar problem. I have tried a number of other suggestion solutions. This is the first that states the requirement for indexer / heavy forwarder and search head configurations. I was placing these configurations on the Universal Forwarder where the XML files are being written. Why does the KV_MODE=xml needs to go on the search head? We are using Splunk Cloud so this is not an option for me.

0 Karma

SplunkTrust
SplunkTrust

Is the data already indexed in Splunk? If yes, then does each of DeviceDescriptor entry is coming as separate event or single event?

0 Karma

Motivator

Hi Somesoni2!

No, the data has not been indexed as of yet. I can test out a couple if that would be helpful. Let me know, thanks!

0 Karma