Splunk Search

How to extract multiple values from XML logs and display all events where FieldA is not equal to FieldB?

anilkamath
Engager

I have some XML responses logged in Splunk which is pretty nested. Let's say there are multiple records of the form.

<records>
      <record>
        <Full Name>Ms. Brown Grimes</Full Name>
        <Country>Dronning Maud Land</Country>
        <NotificationEmail>Sam.Lemke@mckenzie.info</NotificationEmail>
        <Created At>Fri Aug 25 1989 22:17:00 GMT-0700 (Pacific Daylight Time)</Created At>
        <Id>10</Id>
        <Email>Sam.Lemke@mckenzie.info</Email>
      </record>
      <record>
        <Full Name>Irma Ledner I</Full Name>
        <Country>Vatican City</Country>
        <NotificationEmail>GabrielleGmail@gmail.com</NotificationEmail>
        <Created At>Tue Nov 30 1993 08:16:58 GMT-0800 (Pacific Standard Time)</Created At>
        <Id>12</Id>
        <Email>Gabrielle@myrl.biz</Email>
      </record>
    </records>

Now I want to find all records where NotificationEmail is not equal to Email.

What I was trying was piping to regex extractor.

rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"

where \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b is the regex to match email.

mIliofotou_splu
Splunk Employee
Splunk Employee

You can let Splunk extract all the XML fields automatically by changing the props.conf file in the application of interested (say search).

Here is a stanza example:

[my_xml_logs_source_type]
KV_MODE = xml
...
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Parsing XML with regex is a painful process, especially considering Splunk has commands tailored specifically for this.

Note, your example is not valid XML - elements should not contain spaces in their names. Once that's fixed, you can run this:

 search for your events | spath records.record | mvexpand records.record | spath input=records.record | where NOT Email=NotificationEmail

That will extract each record into its own event, parse the elements of the record, and filter according to the email fields.

lguinn2
Legend

The problem is that you need to extract multiple copies of the fields - assuming that the event is defined by the "\" tag.
Within the event, you have multiple values. There are a couple of ways to deal with this, but one would be

yoursearchhere
| rex maxmatch=0 "\<record\>(?<record>.*?)\</record\>"
| mvexpand record
|rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"
| where nemail!=email

The first rex and mvexpand break the original event into multiple events, one for each "record." After that, the original rex is applied and the comparison is made. I didn't verify that the regular expression is correct. Personally, I would have done something much more simple:

| rex "\<NotificationEmail\>(?<nemail>.*?)\</NotificationEmail\>.*?\<Email\>(?<email>.*?)\</Email\>"

somesoni2
Revered Legend

You want to filter the whole response (records set) where any of the record has NotificationEmail is equal to Email OR filter the record, within a response (record set) which has NotificationEmail is equal to Email?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...