I'm trying to extract data from Nessus 5 XML reports. I have configured Splunk to treat each ReportItem object in the XML as a single event and then extract the individual vulnerabilities from those events into a multi-valued field and then I use mvexpand to make a single event out of each vulnerability . Whether I use rex in the search pipeline, a REPORT item in transforms.conf/props.conf, or xpath, the search performance is abysmal when searching over hundreds of hosts. The search takes all of a core and eventually consumes all of the available memory while taking something like 45 minutes or more to run.
Is there any way to structure the search so that it's either faster or that it doesn't consume as much memory? Here is a search using a transform to extract the vulnerabilities into a multi-valued field called "nessus_exploit" for each host record:
index=vulnerabilities sourcetype=nessus |
eval scanner="Nessus" |
eval status="Active" |
rex field=_raw "host-fqdn\">(?P<hostname>[^<]+?)(.(?P<domainname>[^<]+))*?<" |
eval hostname=if(isnull(hostname),dest,hostname) |
eval dest_nt_host=if(isnull(dest_nt_host),"None",dest_nt_host)  |
convert timeformat="%a %b %d %H:%M:%S %Y" mktime(last_scan) |
eval domainname=if(isnull(domainname),"None",domainname) |
mvexpand nessus_exploit |
fields scanner,last_scan,dest,hostname,domainname,dest_nt_host,os,status,nessus_exploit |
rex field=nessus_exploit "(?i)port=\"(?P<dest_port>\d+)\"\ssvc_name=\"(?P<service_name>[^\"]+)\"\sprotocol=\"(?P<protocol>[^\"]+)\"\sseverity=\"(?P<severity>[^\"]+)\"\spluginID=\"(?P<nessus_id>[^\"]+)\"\spluginName=\"(?P<signature_name>[^\"]+)" |
eval severity=severity+1 |
rex field=nessus_exploit max_match=100 "(?i)<cve>CVE-(?P<cve_id>\d+-\d+)" |
rex field=nessus_exploit max_match=100 "(?i)<bid>(?P<bugtraq_id>\d+)" |
rex field=nessus_exploit max_match=100 "(?i)<xref>OSVDB:(?P<osvdb_id>\d+)" |
rex field=nessus_exploit "(?i)solution>(?P<solution>[^<]+)" |
rex field=os max_match=20 "(?i)(?P<os>[^\n]+)" | eval vuln_id=nessus_id  |
eval cve_id=if(isnull(cve_id),"None",cve_id ) |
eval bugtraq_id=if(isnull(bugtraq_id),"None",bugtraq_id ) |
eval osvdb_id=if(isnull(osvdb_id),"None",osvdb_id )  |
lookup bugtraq_cve_lookup bugtraq_id OUTPUT cve_id AS bugtraq_cve |
lookup osvdb_cve_lookup osvdb_id OUTPUT cve_id AS osvdb_cve |
eval cve_combined=mvjoin(cve_id,".") | eval bugtraq_cve=mvjoin(bugtraq_cve,".") |
eval osvdb_cve=mvjoin(osvdb_cve,".") |
eval cve_id=toString(cve_combined)+"."+toString(bugtraq_cve)+"."+toString(osvdb_cve) |
makemv delim="." cve_id | mvexpand cve_id |
where cve_id!="Null" AND now()-last_scan<=2592000  |
dedup dest,protocol,dest_port,vuln_id,cve_id |
lookup nvdb_cvss_lookup cve_id OUTPUT cve_score |
eval cve_score=if(isnull(cve_score),"0.0",cve_score) |
table scanner,last_scan,dest,hostname,domainname,dest_nt_host,protocol,dest_port,cve_id,cve_score,bugtraq_id,osvdb_id,severity,signature_name,vuln_id,solution |
outputlookup open_vulnerabilities_lookup.csv
Here is the search performance data. Looks like command.search.kv is taking the most amount of time.
Execution costs
Duration (seconds)      Component   Invocations     Input count     Output count
    0.016   command.convert     26  987     987
    0.649   command.dedup   26  38,864  36,532
    0.912   command.eval    390     732,910     732,910
    0.032   command.fields  52  167,308     167,308
    1.384   command.lookup  78  190,186     190,186
    0.24    command.makemv  26  76,827  76,827
    149.494     command.mvexpand    52  77,814  318,424
    8.716   command.prededup    26  90,481  38,865
    2.763   command.rex     182     461,949     461,949
    2,788.773   command.search  26  -   987
    2,785.98    command.search.kv   3   -   -
    1.408   command.search.rawdata  3   -   -
    0.827   command.search.typer    26  987     987
    0.502   command.search.fieldalias   3   2,746   2,746
    0   command.search.calcfields   3   2,746   2,746
    0   command.search.filter   3   -   -
    0   command.search.index    3   -   -
    0   command.search.lookups  3   2,746   2,746
    0   command.search.tags     26  987     987
    14.574  command.where   26  241,597     90,481
    0.11    dispatch.createProviderQueue    1   -   -
    0.219   dispatch.evaluate   1   -   -
    0.204   dispatch.evaluate.search    1   -   -
    0   dispatch.evaluate.convert   1   -   -
    0   dispatch.evaluate.dedup     1   -   -
    0   dispatch.evaluate.eval  15  -   -
    0   dispatch.evaluate.fields    1   -   -
    0   dispatch.evaluate.lookup    3   -   -
    0   dispatch.evaluate.makemv    1   -   -
    0   dispatch.evaluate.mvexpand  2   -   -
    0   dispatch.evaluate.outputlookup  1   -   -
    0   dispatch.evaluate.rex   7   -   -
    0   dispatch.evaluate.table     1   -   -
    0   dispatch.evaluate.where     1   -   -
    2,966.083   dispatch.fetch  26  -   -
    147.375     dispatch.preview    3   -   -
    0   dispatch.reduce     1   -   -
    0.192   dispatch.results_combiner   26  -   -
    2,966.068   dispatch.stream.local   26  -   -
    171.246     dispatch.timeline   26  -   -
    0.875   startup.handoff     1   -   -  
What can I do to speed this up?
Thx.
Craig
well you search command seems to be taking most of the time
2,788.773 command.search 26 - 987
have you tried switching to faster media? The search is main part of your waiting 2,788/60 ~ 46 minutes
I noticed you said you tried xpath, but have you tried using spath. Spath without any arguments will perfrom an auto-extract which may simplify some of the work you are doing. You may also try the use of an intermediate heavy fowarder using the transforms.conf and props.conf to send cooked or searing data, taking some of the load off your indexers and search head.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
