Splunk Search

How to ingest a non-standard XML file based on file-mod timestamp and extract all XML attributes?

leonheart78
Explorer

Hi,

I'm trying to ingest multiple files with the below format:

<?xml version="1.0" encoding="UTF-8"?>

<BroadcastData creationDate="20150820080127">

    <CVendorInfo>

        <CVendorId>Lysis</CVendorId>

        <CVendorName>Lysis S.A.</CVendorName>

    </CVendorInfo>

    <ScheduleData>

        <FreqPeriod endTime="20150820082900" beginTime="20150820070000">

            <FreqId>174</FreqId>

            <Event duration="1740" beginTime="20150820070000">

                <EventId>CP0012650858</EventId>

                <DvbEventId>15664</DvbEventId>

                <EventType>S</EventType>

                <PreviewTime>0</PreviewTime>

                <VpgProduction>

                    <VpgText language="eng">

                        <Name>Headline News</Name>

                        <Description>Top Stories in Headlines.</Description>

                        <ExtendedInfo name="Contentid_ref">CP0012650858</ExtendedInfo>

                        <ExtendedInfo name="AudioTrack">eng</ExtendedInfo>

                        <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>

                    </VpgText>

                    <ParentalRating>0</ParentalRating>

                    <DvbContent>

                        <Content nibble2="0" nibble1="0"/>

                        <User nibble2="A" nibble1="0"/>

                    </DvbContent>

                </VpgProduction>

            </Event>

            <Event duration="1860" beginTime="20150820072900">

                <EventId>CP0012650859</EventId>

                <DvbEventId>15665</DvbEventId>

                <EventType>S</EventType>

                <PreviewTime>0</PreviewTime>

                <VpgProduction>

                    <VpgText language="eng">

                        <Name>Boom Bust</Name>

                        <Description>Booms and BustDescription>

                        <ExtendedInfo name="Contentid_ref">CP0012650859</ExtendedInfo>

                        <ExtendedInfo name="AudioTrack">eng</ExtendedInfo>

                        <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>

                    </VpgText>

                    <ParentalRating>0</ParentalRating>

                    <DvbContent>

                        <Content nibble2="0" nibble1="0"/>

                        <User nibble2="A" nibble1="0"/>

                    </DvbContent>

                </VpgProduction>

            </Event>

            <Event duration="1740" beginTime="20150820080000">

                <EventId> CP0012650860</EventId>

                <DvbEventId>15666</DvbEventId>

                <EventType>S</EventType>

                <PreviewTime>0</PreviewTime>

                <VpgProduction>

                    <VpgText language="eng">

                        <Name>Headline News</Name>

                        <Description> Top Stories in Headlines.</Description>

                        <ExtendedInfo name="Contentid_ref"> CP0012650860</ExtendedInfo>

                        <ExtendedInfo name="AudioTrack">eng</ExtendedInfo>

                        <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>

                    </VpgText>

                    <ParentalRating>0</ParentalRating>

                    <DvbContent>

                        <Content nibble2="0" nibble1="0"/>

                        <User nibble2="A" nibble1="0"/>

                    </DvbContent>

                </VpgProduction>

            </Event>

        </FreqPeriod>

    </ScheduleData>

</BroadcastData>

I want to have the timestamp based on the BroadcastData creationDate, and to extract all the XML attributes. How can I go about doing this? Thanks in advance.

0 Karma
1 Solution

gcato
Contributor

Hi Leonheart78,

If you want to do it as part of an inline search then the "spath" command will extract the xml. Refer to this answer http://answers.splunk.com/answers/184469/how-to-extract-unique-values-from-xml-data.html

Splunk can be configured to automatically extract the XML fields also with KV_MODE=xml on the source/sourcetype in props.conf (configure at forwarder end or indexer end). Refer to http://answers.splunk.com/answers/129820/extract-xml-field.html

Use either the rex command, or ideally set up a field extraction, to pull the creation time out of Broadcast namespace, e.g.

... | rex field=_raw "creationDate=\"(?<bc_creationDate>[^\"]+)

View solution in original post

gcato
Contributor

Hi Leonheart78,

If you want to do it as part of an inline search then the "spath" command will extract the xml. Refer to this answer http://answers.splunk.com/answers/184469/how-to-extract-unique-values-from-xml-data.html

Splunk can be configured to automatically extract the XML fields also with KV_MODE=xml on the source/sourcetype in props.conf (configure at forwarder end or indexer end). Refer to http://answers.splunk.com/answers/129820/extract-xml-field.html

Use either the rex command, or ideally set up a field extraction, to pull the creation time out of Broadcast namespace, e.g.

... | rex field=_raw "creationDate=\"(?<bc_creationDate>[^\"]+)
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...