Splunk Search

How to extract multiple values from XML logs and display all events where FieldA is not equal to FieldB?

anilkamath
Engager

I have some XML responses logged in Splunk which is pretty nested. Let's say there are multiple records of the form.

<records>
      <record>
        <Full Name>Ms. Brown Grimes</Full Name>
        <Country>Dronning Maud Land</Country>
        <NotificationEmail>Sam.Lemke@mckenzie.info</NotificationEmail>
        <Created At>Fri Aug 25 1989 22:17:00 GMT-0700 (Pacific Daylight Time)</Created At>
        <Id>10</Id>
        <Email>Sam.Lemke@mckenzie.info</Email>
      </record>
      <record>
        <Full Name>Irma Ledner I</Full Name>
        <Country>Vatican City</Country>
        <NotificationEmail>GabrielleGmail@gmail.com</NotificationEmail>
        <Created At>Tue Nov 30 1993 08:16:58 GMT-0800 (Pacific Standard Time)</Created At>
        <Id>12</Id>
        <Email>Gabrielle@myrl.biz</Email>
      </record>
    </records>

Now I want to find all records where NotificationEmail is not equal to Email.

What I was trying was piping to regex extractor.

rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"

where \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b is the regex to match email.

mIliofotou_splu
Splunk Employee
Splunk Employee

You can let Splunk extract all the XML fields automatically by changing the props.conf file in the application of interested (say search).

Here is a stanza example:

[my_xml_logs_source_type]
KV_MODE = xml
...
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Parsing XML with regex is a painful process, especially considering Splunk has commands tailored specifically for this.

Note, your example is not valid XML - elements should not contain spaces in their names. Once that's fixed, you can run this:

 search for your events | spath records.record | mvexpand records.record | spath input=records.record | where NOT Email=NotificationEmail

That will extract each record into its own event, parse the elements of the record, and filter according to the email fields.

lguinn2
Legend

The problem is that you need to extract multiple copies of the fields - assuming that the event is defined by the "\" tag.
Within the event, you have multiple values. There are a couple of ways to deal with this, but one would be

yoursearchhere
| rex maxmatch=0 "\<record\>(?<record>.*?)\</record\>"
| mvexpand record
|rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"
| where nemail!=email

The first rex and mvexpand break the original event into multiple events, one for each "record." After that, the original rex is applied and the comparison is made. I didn't verify that the regular expression is correct. Personally, I would have done something much more simple:

| rex "\<NotificationEmail\>(?<nemail>.*?)\</NotificationEmail\>.*?\<Email\>(?<email>.*?)\</Email\>"

somesoni2
Revered Legend

You want to filter the whole response (records set) where any of the record has NotificationEmail is equal to Email OR filter the record, within a response (record set) which has NotificationEmail is equal to Email?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...