Splunk Search

Multivalue XML extraction not working

responsys_cm
Builder

I'm trying to add several lines of XML to a multi-valued field. The data looks like:

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hashtables Denial of Service - The Exploit-DB Ref : 18296]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18296]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hash Table Collision Proof Of Concept - The Exploit-DB Ref : 18305]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18305]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4153]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[MyBulletinBoard (MyBB) <= 1.1.5 (CLIENT-IP) SQL Injection Exploit - The Exploit-DB Ref : 2012]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/2012]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2012-0781]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

My transforms.conf looks like:

[qualys_exploit]

REGEX = (?mis)(&lt;EXPLT&gt;.*&lt;/EXPLT&gt;)

FORMAT = qualys_exploit::$1

MV_ADD = true

props.conf:

REPORT-qualys_exploit = qualys_exploit

Splunk is taking everything between the first opening EXPLT tag and last closing EXPLT tag and making it a single event. What am I doing wrong that it's not treating these as multiple individual events?

Thx.

C

Tags (1)
0 Karma
1 Solution

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

View solution in original post

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...