Splunk Search

How to extract XML data from mixed content into one field for later use with spath?

roshannon
New Member

I have a mixed output log that contains XML and non-XML data. I am looking to extract the XML data into a field that I can later use spath on to get individual fields. My sample data is below. I am looking to get the entire <root>*<\root> into a single field that later I can use spath to get individual fields that I might want to search on. I have seen other recommendations to put XML into a single field for later spath usage, but did not see how to do that.

2015 May 22 15:23:44:024 GMT -0700 BW.DomainDMSEvents-DomainDMSEvents-P01 User [BW-User] - Job-10003 [UtilityProcesses/CreateAuditTrail.process/Log]: AuditTrail: 10003|Projects/DomainDMSEvents/ProcDefs/Starters/PublishDMSScanEvents.process||file|||2015-05-22T15:23:44.022-07:00|DomainDMSEvents-DomainDMSEvents-P01||||false||
|<root>
    <messageIn>
        <channel>file</channel>
        <msgID>1432333424013</msgID>
        <corlID>1432333424013</corlID>
        <raw><?xml version="1.0" encoding="UTF-8"?>
   <ns0:EventSourceOuputNoContentClass xmlns:ns0="http://www.tibco.com/namespaces/tnt/plugins/file"><action>remove</action><timeOccurred>1432333424013</timeOccurred><fileInfo><fullName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</fullName><fileName>DMSEvents.txt</fileName><location>/nfs/appdata/CTSE/OMS/DMS</location><configuredFileName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</configuredFileName><type>file</type><readProtected>true</readProtected><writeProtected>true</writeProtected><size>5651</size><lastModified>2015-05-20T12:07:28-07:00</lastModified></fileInfo></ns0:EventSourceOuputNoContentClass></raw>
            <EMSHeaderProperties>
                <header>
                    <name>fileNewName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/processed/DMSEvents.txt</value>
                </header>
                <header>
                    <name>fileName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</value>
                </header>
                <header>
                    <name>timestamp</name>
                    <value>1432333424017</value>
                </header>
            </EMSHeaderProperties>
            <parsed>
                <type>filePoller</type>
                <other/>
            </parsed>
        </messageIn>
        <messageOut>
            <name>DocImageEvent</name>
            <TXInfo>
                <tranType>DocImageEvent</tranType>
                <evtType>DocImageEvent</evtType>
                <topicOverride>Domain.CTS.CTSE.Canonical.S2C.DomainDMSEvents.DocImageEvent</topicOverride>
            </TXInfo>
        </messageOut>
        <psDef>
            <funcArea>S2C</funcArea>
            <appSource>DomainDMSEvents</appSource>
            <txIdentifier>DocImageEvent</txIdentifier>
            <startTS>1432333424017</startTS>
        </psDef>
    </root>|
0 Karma
1 Solution

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"

View solution in original post

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...