Splunk Search

How to extract Title from this XML data

tkwaller1
Path Finder

Hello

I have a very long xml record that I am trying to spath some data from but I cant seem to get it to work. Can someone possibly give me some assistance?
Here's what the record looks like(sorry its SUPER long)

 

 

2024-01-08 12:09:43.000, LOAD_DATE="2024-01-08 12:09:43.0", EVENT_LENGTH="14912", ID="3f29f958-af6e-4050-919e-fb23fc27e2bc", MSG_src="PXXXX", MSG_DOMAIN="APP", MSG_TYPE="INBOUND", MSG_DATA="<?xml version='1.0' encoding='UTF-8'?>
<Message>
  <header>
    <domain>APP</domain>
    <source>PXXXX</source>
    <messageType>INBOUND</messageType>
    <eventId>f8y6jk45-af6e-4050-919e-fb23fc27e2bc</eventId>
  </header>
  <parsing>
    <parsingStatus>SUCCESS</parsingStatus>
    <parsingStatusDesc>Success</parsingStatusDesc>
    <formType>1234</formType>
  </parsing>
  <ABC>
    <Code>ABC</Code>
    <Number>209819</Number>
    <sequence>0236</sequence>
    <ReceiptDate>2024-01-08T00:00:00.000-05:00</ReceiptDate>
    <FirstDate>2024-01-08T00:00:00.000-05:00</FirstDate>
    <Status>SUCCESS</Status>
    <location>xxxxxxxx</location>
    <id>ci1704729189245.431902@fdsahl86ceb40c</id>
    <format>ABCD</format>
  </ABC>
  <applicationDetails>
    <applicationGlobalId>500168938</applicationGlobalId>
    <applicationType>ABC</applicationType>
    <applicationSubtype>UNKNOWN</applicationSubtype>
    <applicationNumber>123456</applicationNumber>
    <applicationRelationships>
      <applicationRelationship>
        <ReasonCode>XYZ</ReasonCode> 
		<Desc>BLAH BLAH BLAH</Desc>
        <applicationGlobalId>123456789</applicationGlobalId>
        <applicationNumber>123456</applicationNumber>
        <applicationSubtype>UNKNOWN</applicationSubtype>
        <applicationType>RED</applicationType>
      </applicationRelationship>
    </applicationRelationships>
    <applicationPatents/>
    <applicationStatuses>
      <applicationStatus>
        <statusCode>APPROVED</statusCode>
        <statusDescription>APPROVED</statusDescription>
        <statusStartDate>2017-11-30T00:00:00.000-05:00</statusStartDate>
      </applicationStatus>
    </applicationStatuses>
    <applicationProperties/>
  </applicationDetails>
  <InboundDetails>
    <InboundType>Reply</InboundType>
    <InboundSubtype>Reply2</InboundSubtype>
    <InboundSequenceNumber>0236</InboundSequenceNumber>
  </InboundDetails>
  <form>
    <attributes>123-4560910-0001"/>
      <attribute description="EXPIRATION DATE" name="Expiration Date" value="03/31/2024"/>
      <attribute description="name" name="name_holder" value="Place Inc."/>
      <attribute description="NUMBER" name="number" value="209819"/>
      <attribute description="Bunch of strings" name="Desc"/>
    </attributes>
    <List>
      <items/>
    </List>
    <infoList>
      <info>
        <Type>Information goes here</Type>
        <name>Me Formal</name>
        <phoneNumber>+1 (111) 222-333</phoneNumber>
        <addressLine1>1234 Road Drive</addressLine1>
        <city>Place, MO</city>
        <zipCode>12345</zipCode>
        <emailAddress>me.formal@domain.com</emailAddress>
        <partyContacts>
          <partyContact>
			<Date>2024-01-04T00:00:00.000-05:00</Date>
            <state>MO</state>
            <emailAddress>me.formal@domain.com</emailAddress>
            <addressLine1>1234 Road Drive</addressLine1>
            <city>Place</city>
            <country>UNITED STATES</country>
            <phoneNumber>+1 (111) 222-333</phoneNumber>
            <zipCode>12345</zipCode>
            <name>Me Formal</name>
            <contactType>United States</contactType>
          </partyContact>
        </partyContacts>
      </info>
    </infoList>
  </form>
  <Information>
    <Number>11,222,333</Number>
    <IssueDate>2023-12-12</IssueDate>
    <ApprovalDate>2017-11-30</ApprovalDate>
    <ExpirationDate>2035-11-06</ExpirationDate>
    <SubType>Y</SubType>
    <Status>SUCCESS</Status>
  </Information>
  <index/>
  <additionalInfo>
    <attributes>
      <attribute description="title" name="title" value="Letter"/>
    </attributes>
    <fileDetails>
      <fileDetail>
        <Toc>application||form</Toc>
        <title>FABDC REDS</title>
        <fileName>file.pdf</fileName>
        <fileType>pdf</fileType>
        <formType>Long sting of data</formType>
        <filePath>\\filepath\file.pdf</filePath>
      </fileDetail>
      <fileDetail>
        <abcdToc>v1-place||v1-2-file-name</abcdToc>
        <title>Letter</title>
        <fileName>letter.pdf</fileName>
        <fileType>pdf</fileType>
        <filePath>\\us\letter.pdf</filePath>
      </fileDetail>
      <fileDetail>
        <abcdToc>information</abcdToc>
        <title>11-222-333</title>
        <fileName>11-222-333.pdf</fileName>
        <fileType>pdf</fileType>
        <filePath>\\ab\11-222-333.pdf</filePath>
      </fileDetail>
    </fileDetails>
    <tags/>
  </additionalInfo>
</Message>"

 

 

At the end, I am trying to get the data from the "<fileDetails>" section, specifically the "<title>" for each file. It would have to be multi-value since there may, for a single record, be a single OR multiple Titles.

I've tried a few variations of spath, as well as xmlkv, but as of yet haven't found anything that has given me the results I am expecting.
For the example above I would expect to have 3 "Titles":

 

 

FABDC REDS
Letter
11-222-333

 

 

Any ideas how to get this data out?


Thanks for the help!

Labels (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

To extract a single field from the event, I'd use the rex command.  It will give you a multi-value field with all of the title values.

| rex max_match=0 "\<title>(?<title>[^\<]+)"

 

---
If this reply helps you, Karma would be appreciated.

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

Unfortunately, Splunk cannot automatically extract MSG_DATA correctly because the XML document contains double quote.  If MSG_DATA is always the last field in the event, you can use

 

| eval MSG_DATA = replace(_raw, ".+,\s*MSG_DATA=\"|\"$", "")
| spath input=MSG_DATA path=Message.additionalInfo.fileDetails.fileDetail.title
| table Message.additionalInfo.fileDetails.fileDetail.title

 

Your sample data (which includes an invalid fragment that I remove) results in

Message.additionalInfo.fileDetails.fileDetail.title
FABDC REDS
Letter
11-222-333

 

Normally, I advise against treating structured data as text.  But if you cannot be certain that MSG_DATA is the last field and cannot be certain of the exact terms that follows MSG_DATA, rex as @richgalloway suggested would be more stable.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To extract a single field from the event, I'd use the rex command.  It will give you a multi-value field with all of the title values.

| rex max_match=0 "\<title>(?<title>[^\<]+)"

 

---
If this reply helps you, Karma would be appreciated.

tkwaller1
Path Finder
max_match=0


Thats what I didn't include, I completely spaced that option. Thanks as always!

dtburrows3
Builder

With the assumption the field MSG_DATA is properly extracted and a valid XML object then I think this SPL will get you a MV field of "file_title".

<base_search>
    | eval
        file_title=coalesce(spath(MSG_DATA, "Message.additionalInfo.fileDetails{}.fileDetail.title"), spath(MSG_DATA, "Message.additionalInfo.fileDetails.fileDetail.title"))

Screenshot of it on my local instance:

dtburrows3_0-1704900980599.png

 

0 Karma

tkwaller1
Path Finder

I like this answer, unfortunately I am going to have to update the props for this since as it is not being recognized as a valid xml object and therefore doesn't work. Thanks for the assistance, I greatly appreciate you help!

0 Karma
Get Updates on the Splunk Community!

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...