Getting Data In

Problem with XML input

mblauw
Path Finder

I've got a problem with my XML input. I've tried several settings, but can't seem to find the right ones.. Here's a sample of my data:

<?xml version="1.0" encoding="UTF-8"?>



<ActueleVertrekTijden>

    <VertrekkendeTrein>
        <RitNummer>781</RitNummer>
        <VertrekTijd>2017-03-24T22:04:00+0100</VertrekTijd>

            <VertrekVertraging>PT7M</VertrekVertraging>


            <VertrekVertragingTekst>+7 min</VertrekVertragingTekst>

        <EindBestemming>Groningen</EindBestemming>
        <TreinSoort>Intercity</TreinSoort>

            <RouteTekst>A'dam Zuid, Almere C., Lelystad C.</RouteTekst>


            <Vervoerder>NS</Vervoerder>

        <VertrekSpoor wijziging="false">1-2</VertrekSpoor>


    </VertrekkendeTrein>

    <VertrekkendeTrein>
        <RitNummer>11683</RitNummer>
        <VertrekTijd>2017-03-24T22:05:00+0100</VertrekTijd>


        <EindBestemming>Amersfoort Schothorst</EindBestemming>
        <TreinSoort>Intercity</TreinSoort>

            <RouteTekst>A'dam Zuid, Hilversum, Amersfoort</RouteTekst>


            <Vervoerder>NS</Vervoerder>

        <VertrekSpoor wijziging="false">3</VertrekSpoor>


    </VertrekkendeTrein>

</ActueleVertrekTijden>

I've currently got these settings in props.conf:

[ns_api]
CHARSET=AUTO
LINE_BREAKER=>\s*(?=)
REPORT-xmlext=xml-extr
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
TIME_PREFIX=\
disabled=false
pulldown_type=true

And these settings in transforms.conf:

[xml-extr]
REGEX = <([^>]+)>([^<]*)<\/\1>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true

Setting should_linemerge to true already helps a bit, but is still not correctly extracting separate events.

0 Karma

DMohn
Motivator

Try the following setting in your transforms.conf:

REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>

(further reference here: https://answers.splunk.com/answers/133533/xml-extraction.html?utm_source=typeahead&utm_medium=newque...)

This will at least capture all the "regular" XML events.

Furthermore you have some other settings that could be optimized. As you have multiline events you should set the linemerging to true. then, the timestamp extraction could be erroneous.

Setting these options in your porps.conf could help:

SHOULD_LINEMERGE = true
TIMESTAMP_PREFIX=<VertrekTijd>
TIME_FORMAT=%Y-%m-%dT%H:%M:%S%z
LINEBREAKER = (\r\n)+(<VertrekkendeTrein>)

You will still be left with some unextracted fieds within the event tags, but you can still include additional extraction regexes for these.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Maximizing the Value of Splunk ES 8.x

Splunk Enterprise Security (ES) continues to be a leader in the Gartner Magic Quadrant, reflecting its pivotal ...