I have some XML responses logged in Splunk which is pretty nested. Let's say there are multiple records of the form.
<records>
<record>
<Full Name>Ms. Brown Grimes</Full Name>
<Country>Dronning Maud Land</Country>
<NotificationEmail>Sam.Lemke@mckenzie.info</NotificationEmail>
<Created At>Fri Aug 25 1989 22:17:00 GMT-0700 (Pacific Daylight Time)</Created At>
<Id>10</Id>
<Email>Sam.Lemke@mckenzie.info</Email>
</record>
<record>
<Full Name>Irma Ledner I</Full Name>
<Country>Vatican City</Country>
<NotificationEmail>GabrielleGmail@gmail.com</NotificationEmail>
<Created At>Tue Nov 30 1993 08:16:58 GMT-0800 (Pacific Standard Time)</Created At>
<Id>12</Id>
<Email>Gabrielle@myrl.biz</Email>
</record>
</records>
Now I want to find all records where NotificationEmail is not equal to Email.
What I was trying was piping to regex extractor.
rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"
where \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
is the regex to match email.
You can let Splunk extract all the XML fields automatically by changing the props.conf
file in the application of interested (say search).
Here is a stanza example:
[my_xml_logs_source_type]
KV_MODE = xml
...
Parsing XML with regex is a painful process, especially considering Splunk has commands tailored specifically for this.
Note, your example is not valid XML - elements should not contain spaces in their names. Once that's fixed, you can run this:
search for your events | spath records.record | mvexpand records.record | spath input=records.record | where NOT Email=NotificationEmail
That will extract each record into its own event, parse the elements of the record, and filter according to the email fields.
The problem is that you need to extract multiple copies of the fields - assuming that the event is defined by the "\" tag.
Within the event, you have multiple values. There are a couple of ways to deal with this, but one would be
yoursearchhere
| rex maxmatch=0 "\<record\>(?<record>.*?)\</record\>"
| mvexpand record
|rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"
| where nemail!=email
The first rex
and mvexpand
break the original event into multiple events, one for each "record." After that, the original rex
is applied and the comparison is made. I didn't verify that the regular expression is correct. Personally, I would have done something much more simple:
| rex "\<NotificationEmail\>(?<nemail>.*?)\</NotificationEmail\>.*?\<Email\>(?<email>.*?)\</Email\>"
You want to filter the whole response (records set) where any of the record has NotificationEmail is equal to Email OR filter the record, within a response (record set) which has NotificationEmail is equal to Email?