I am ingesting Qualys data via the Qualys Technology Add-on for Splunk (v1.8.7). To reduce daily volume, I have chosen to ingest incrementally (i.e., do not perform a full backup/pull every day). Fortunately, the TA tags events appropriate to leverage the Vulnerabilities data model. Unfortunately, since I'm only receiving incremental updates every day, I need to perform a dedup after querying the accelerated data model in order to get the current/latest status for each vulnerability. For example, most of my searches begin like this:
| tstats count
by _time, Vulnerabilities.dest, Vulnerabilities.dest_id, Vulnerabilities.dest_host, Vulnerabilities.severity_id, Vulnerabilities.severity, Vulnerabilities.signature_id, Vulnerabilities.status, Vulnerabilities.type span=1s
| fields - count
| dedup Vulnerabilities.signature_id Vulnerabilities.dest_id sortby -_time
The tstats command targets the accelerated data model, and separates each event by vulnerability and host.
The fields command remove the count, since it's not used.
The dedup command removes all events, except for the most recent, on signature_id (QID) and dest_id (HOST_ID).
This works well when filtering individual hosts or QIDs in the tstats where clause, but if I wanted to gather a count of all active, confirmed vulnerabilities by severity and host, The search is impossibly long because it first puts all results into memory before attempting a dedup.
Ideally, part of running the data model acceleration would be able to update the summary with only the latest status, so all that is needed is running tstats against the data model without a dedup. Unfortunately, this doesn't appear to be possible. Anyone have any ideas on how to deal with this issue?
Did you try playing with earliest() and first()? It sounds like that might help you.
Put the fields you use in dedup as "group by" fields and use the rest with first(), or values() or earliest().
I did. I don't believe that will work, because after I get the latest vulnerabilities by dest_id (HOST_ID) and signature_id (QID), I need to aggregate that by doing a count by severity_id and dest_id.