I am trying to parse a bunch of Nessus vulnerability plugin files and extract the CVE and OSVDB reference IDs from each file. Each file is treated as a single event.
The format of the data is different for each plugin (probably because they were written by different people). Here are some samples:
script_cve_id("CVE-2010-4344"); script_cve_id("CVE-2010-3766", "CVE-2010-3767", "CVE-2010-3768", "CVE-2010-3770", "CVE-2010-3771", "CVE-2010-3772", "CVE-2010-3773", "CVE-2010-3774", "CVE-2010-3775", "CVE-2010-3776", "CVE-2010-3777", "CVE-2010-3778"); script_cve_id( "CVE-2010-3512", "CVE-2010-3514", "CVE-2010-3544", "CVE-2010-3545" );
I've tried the following transforms to capture the events, but only a single CVE ID is showing up for each one:
[nessus_plugins_cve] REGEX = (?mi)script_cve_id\(\s*"CVE-(?P<cve_id>\d+-\d+)(?=",*) FORMAT = cve_id::$1 MV_ADD = true
Why isn't my regex capturing more than one CVE reference?
Your regex won't work with MV_ADD, primarily because you have anchored it to the 'script_cve_id'.
REGEX = (?mi)\"CVE-(?<cve_id>\d+-\d+)
The problem is that some other parts of the plugin configuration make references to CVE numbers that may or may not be associated with that particular plugin.
Is there any way to extract multiple events if it's anchored to "scriptcveid"? Worst case scenario I can capture the entire contents of that field and extract them at search time, but I'd rather not if I don't have to...
Nope, there isn't really another way to do it. You might try negative look ahead/behind regex, but those are tricky and expensive. In my Nessus app, there are quite a few quandaries like this due to the sub-optimal structure of the .nessus files. More often than not, I extract out the whole CVE list as a field, then extract each entry from this field as another multivalued field.