Splunk Search

Multivalue XML extraction not working

Builder

I'm trying to add several lines of XML to a multi-valued field. The data looks like:

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hashtables Denial of Service - The Exploit-DB Ref : 18296]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18296]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hash Table Collision Proof Of Concept - The Exploit-DB Ref : 18305]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18305]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4153]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[MyBulletinBoard (MyBB) <= 1.1.5 (CLIENT-IP) SQL Injection Exploit - The Exploit-DB Ref : 2012]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/2012]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2012-0781]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

My transforms.conf looks like:

[qualys_exploit]

REGEX = (?mis)(&lt;EXPLT&gt;.*&lt;/EXPLT&gt;)

FORMAT = qualys_exploit::$1

MV_ADD = true

props.conf:

REPORT-qualys_exploit = qualys_exploit

Splunk is taking everything between the first opening EXPLT tag and last closing EXPLT tag and making it a single event. What am I doing wrong that it's not treating these as multiple individual events?

Thx.

C

Tags (1)
0 Karma
1 Solution

Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

View solution in original post

Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

View solution in original post