Splunk Search

How to extract XML data from mixed content into one field for later use with spath?

roshannon
New Member

I have a mixed output log that contains XML and non-XML data. I am looking to extract the XML data into a field that I can later use spath on to get individual fields. My sample data is below. I am looking to get the entire <root>*<\root> into a single field that later I can use spath to get individual fields that I might want to search on. I have seen other recommendations to put XML into a single field for later spath usage, but did not see how to do that.

2015 May 22 15:23:44:024 GMT -0700 BW.DomainDMSEvents-DomainDMSEvents-P01 User [BW-User] - Job-10003 [UtilityProcesses/CreateAuditTrail.process/Log]: AuditTrail: 10003|Projects/DomainDMSEvents/ProcDefs/Starters/PublishDMSScanEvents.process||file|||2015-05-22T15:23:44.022-07:00|DomainDMSEvents-DomainDMSEvents-P01||||false||
|<root>
    <messageIn>
        <channel>file</channel>
        <msgID>1432333424013</msgID>
        <corlID>1432333424013</corlID>
        <raw><?xml version="1.0" encoding="UTF-8"?>
   <ns0:EventSourceOuputNoContentClass xmlns:ns0="http://www.tibco.com/namespaces/tnt/plugins/file"><action>remove</action><timeOccurred>1432333424013</timeOccurred><fileInfo><fullName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</fullName><fileName>DMSEvents.txt</fileName><location>/nfs/appdata/CTSE/OMS/DMS</location><configuredFileName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</configuredFileName><type>file</type><readProtected>true</readProtected><writeProtected>true</writeProtected><size>5651</size><lastModified>2015-05-20T12:07:28-07:00</lastModified></fileInfo></ns0:EventSourceOuputNoContentClass></raw>
            <EMSHeaderProperties>
                <header>
                    <name>fileNewName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/processed/DMSEvents.txt</value>
                </header>
                <header>
                    <name>fileName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</value>
                </header>
                <header>
                    <name>timestamp</name>
                    <value>1432333424017</value>
                </header>
            </EMSHeaderProperties>
            <parsed>
                <type>filePoller</type>
                <other/>
            </parsed>
        </messageIn>
        <messageOut>
            <name>DocImageEvent</name>
            <TXInfo>
                <tranType>DocImageEvent</tranType>
                <evtType>DocImageEvent</evtType>
                <topicOverride>Domain.CTS.CTSE.Canonical.S2C.DomainDMSEvents.DocImageEvent</topicOverride>
            </TXInfo>
        </messageOut>
        <psDef>
            <funcArea>S2C</funcArea>
            <appSource>DomainDMSEvents</appSource>
            <txIdentifier>DocImageEvent</txIdentifier>
            <startTS>1432333424017</startTS>
        </psDef>
    </root>|
0 Karma
1 Solution

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"

View solution in original post

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...