Splunk Search

How to extract Title from this XML data

tkwaller1
Path Finder

Hello

I have a very long xml record that I am trying to spath some data from but I cant seem to get it to work. Can someone possibly give me some assistance?
Here's what the record looks like(sorry its SUPER long)

 

 

2024-01-08 12:09:43.000, LOAD_DATE="2024-01-08 12:09:43.0", EVENT_LENGTH="14912", ID="3f29f958-af6e-4050-919e-fb23fc27e2bc", MSG_src="PXXXX", MSG_DOMAIN="APP", MSG_TYPE="INBOUND", MSG_DATA="<?xml version='1.0' encoding='UTF-8'?>
<Message>
  <header>
    <domain>APP</domain>
    <source>PXXXX</source>
    <messageType>INBOUND</messageType>
    <eventId>f8y6jk45-af6e-4050-919e-fb23fc27e2bc</eventId>
  </header>
  <parsing>
    <parsingStatus>SUCCESS</parsingStatus>
    <parsingStatusDesc>Success</parsingStatusDesc>
    <formType>1234</formType>
  </parsing>
  <ABC>
    <Code>ABC</Code>
    <Number>209819</Number>
    <sequence>0236</sequence>
    <ReceiptDate>2024-01-08T00:00:00.000-05:00</ReceiptDate>
    <FirstDate>2024-01-08T00:00:00.000-05:00</FirstDate>
    <Status>SUCCESS</Status>
    <location>xxxxxxxx</location>
    <id>ci1704729189245.431902@fdsahl86ceb40c</id>
    <format>ABCD</format>
  </ABC>
  <applicationDetails>
    <applicationGlobalId>500168938</applicationGlobalId>
    <applicationType>ABC</applicationType>
    <applicationSubtype>UNKNOWN</applicationSubtype>
    <applicationNumber>123456</applicationNumber>
    <applicationRelationships>
      <applicationRelationship>
        <ReasonCode>XYZ</ReasonCode> 
		<Desc>BLAH BLAH BLAH</Desc>
        <applicationGlobalId>123456789</applicationGlobalId>
        <applicationNumber>123456</applicationNumber>
        <applicationSubtype>UNKNOWN</applicationSubtype>
        <applicationType>RED</applicationType>
      </applicationRelationship>
    </applicationRelationships>
    <applicationPatents/>
    <applicationStatuses>
      <applicationStatus>
        <statusCode>APPROVED</statusCode>
        <statusDescription>APPROVED</statusDescription>
        <statusStartDate>2017-11-30T00:00:00.000-05:00</statusStartDate>
      </applicationStatus>
    </applicationStatuses>
    <applicationProperties/>
  </applicationDetails>
  <InboundDetails>
    <InboundType>Reply</InboundType>
    <InboundSubtype>Reply2</InboundSubtype>
    <InboundSequenceNumber>0236</InboundSequenceNumber>
  </InboundDetails>
  <form>
    <attributes>123-4560910-0001"/>
      <attribute description="EXPIRATION DATE" name="Expiration Date" value="03/31/2024"/>
      <attribute description="name" name="name_holder" value="Place Inc."/>
      <attribute description="NUMBER" name="number" value="209819"/>
      <attribute description="Bunch of strings" name="Desc"/>
    </attributes>
    <List>
      <items/>
    </List>
    <infoList>
      <info>
        <Type>Information goes here</Type>
        <name>Me Formal</name>
        <phoneNumber>+1 (111) 222-333</phoneNumber>
        <addressLine1>1234 Road Drive</addressLine1>
        <city>Place, MO</city>
        <zipCode>12345</zipCode>
        <emailAddress>me.formal@domain.com</emailAddress>
        <partyContacts>
          <partyContact>
			<Date>2024-01-04T00:00:00.000-05:00</Date>
            <state>MO</state>
            <emailAddress>me.formal@domain.com</emailAddress>
            <addressLine1>1234 Road Drive</addressLine1>
            <city>Place</city>
            <country>UNITED STATES</country>
            <phoneNumber>+1 (111) 222-333</phoneNumber>
            <zipCode>12345</zipCode>
            <name>Me Formal</name>
            <contactType>United States</contactType>
          </partyContact>
        </partyContacts>
      </info>
    </infoList>
  </form>
  <Information>
    <Number>11,222,333</Number>
    <IssueDate>2023-12-12</IssueDate>
    <ApprovalDate>2017-11-30</ApprovalDate>
    <ExpirationDate>2035-11-06</ExpirationDate>
    <SubType>Y</SubType>
    <Status>SUCCESS</Status>
  </Information>
  <index/>
  <additionalInfo>
    <attributes>
      <attribute description="title" name="title" value="Letter"/>
    </attributes>
    <fileDetails>
      <fileDetail>
        <Toc>application||form</Toc>
        <title>FABDC REDS</title>
        <fileName>file.pdf</fileName>
        <fileType>pdf</fileType>
        <formType>Long sting of data</formType>
        <filePath>\\filepath\file.pdf</filePath>
      </fileDetail>
      <fileDetail>
        <abcdToc>v1-place||v1-2-file-name</abcdToc>
        <title>Letter</title>
        <fileName>letter.pdf</fileName>
        <fileType>pdf</fileType>
        <filePath>\\us\letter.pdf</filePath>
      </fileDetail>
      <fileDetail>
        <abcdToc>information</abcdToc>
        <title>11-222-333</title>
        <fileName>11-222-333.pdf</fileName>
        <fileType>pdf</fileType>
        <filePath>\\ab\11-222-333.pdf</filePath>
      </fileDetail>
    </fileDetails>
    <tags/>
  </additionalInfo>
</Message>"

 

 

At the end, I am trying to get the data from the "<fileDetails>" section, specifically the "<title>" for each file. It would have to be multi-value since there may, for a single record, be a single OR multiple Titles.

I've tried a few variations of spath, as well as xmlkv, but as of yet haven't found anything that has given me the results I am expecting.
For the example above I would expect to have 3 "Titles":

 

 

FABDC REDS
Letter
11-222-333

 

 

Any ideas how to get this data out?


Thanks for the help!

Labels (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

To extract a single field from the event, I'd use the rex command.  It will give you a multi-value field with all of the title values.

| rex max_match=0 "\<title>(?<title>[^\<]+)"

 

---
If this reply helps you, Karma would be appreciated.

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

Unfortunately, Splunk cannot automatically extract MSG_DATA correctly because the XML document contains double quote.  If MSG_DATA is always the last field in the event, you can use

 

| eval MSG_DATA = replace(_raw, ".+,\s*MSG_DATA=\"|\"$", "")
| spath input=MSG_DATA path=Message.additionalInfo.fileDetails.fileDetail.title
| table Message.additionalInfo.fileDetails.fileDetail.title

 

Your sample data (which includes an invalid fragment that I remove) results in

Message.additionalInfo.fileDetails.fileDetail.title
FABDC REDS
Letter
11-222-333

 

Normally, I advise against treating structured data as text.  But if you cannot be certain that MSG_DATA is the last field and cannot be certain of the exact terms that follows MSG_DATA, rex as @richgalloway suggested would be more stable.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To extract a single field from the event, I'd use the rex command.  It will give you a multi-value field with all of the title values.

| rex max_match=0 "\<title>(?<title>[^\<]+)"

 

---
If this reply helps you, Karma would be appreciated.

tkwaller1
Path Finder
max_match=0


Thats what I didn't include, I completely spaced that option. Thanks as always!

dtburrows3
Builder

With the assumption the field MSG_DATA is properly extracted and a valid XML object then I think this SPL will get you a MV field of "file_title".

<base_search>
    | eval
        file_title=coalesce(spath(MSG_DATA, "Message.additionalInfo.fileDetails{}.fileDetail.title"), spath(MSG_DATA, "Message.additionalInfo.fileDetails.fileDetail.title"))

Screenshot of it on my local instance:

dtburrows3_0-1704900980599.png

 

0 Karma

tkwaller1
Path Finder

I like this answer, unfortunately I am going to have to update the props for this since as it is not being recognized as a valid xml object and therefore doesn't work. Thanks for the assistance, I greatly appreciate you help!

0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...