Splunk Search

search with joins and append takes too long

claatu
Explorer

Goal is to determine, from specific vulnerabilities found in scans, the percentage that have been ‘fixed’, meaning they are not found in the latest scans. So first we get the vulns for all time, then check the ones related to the latest scans. So alltime – latest = fixed. And fixed / alltime * 100 = percentage fixed.
The search/query takes way too long. How to speed it up?
Here is the approach.

sourcetype=alpha
| eval comment=”the above gets the results of the scans All Time”
| eval comment=”cve is multivalue field, so split it up so we can lookup a specific CVE”
| makemv delim=";" cve | mvexpand cve | rename cve AS CVE
| eval comment=”now just look at the ones we care about”
| join CVE [|inputlookup PriorityCVE.csv]
| eval comment=”label these as ‘alltime’”
| eval ReportKey="alltime"
| eval comment=”now get the scans representing the latest info”
| append [search sourcetype=alpha | makemv delim=";" cve | mvexpand cve | rename cve AS CVE
| join CVE [|inputlookup PriorityCVE_test.csv]
| eval comment=”last_scan_finished is in terms of date not granularity of _time”
| convert timeformat="%F" ctime(_time) AS CurScan
| eval comment=”the beta source has events over time with last_scan_finished per asset”
| join asset_id, CurScan [search sourcetype=beta | dedup asset_id last_scan_finished
| fields asset_id last_scan_finished | sort 0 - last_scan_finished | dedup asset_id
| convert timeformat="%Y-%m-%d %H:%M:%S.%Q" mktime(last_scan_finished) AS LastScanTmp
| convert timeformat="%F" ctime(LastScanTmp) AS CurScan | fields asset_id CurScan]
| eval comment=”label those that match the last_scan_finished date as ‘nowtime’”
| eval ReportKey="nowtime”
| table asset_id CVE ReportKey
| eval comment=“get counts”
| stats values(ReportKey) as ReportKey by asset_id CVE
| eval comment=”if you only have the alltime label, you have been fixed”
| eval status=case(mvcount(ReportKey)=2, "Persistent", ReportKey="alltime", "Fixed", true(), "New")
| eval comment=”(since all scans are in alltime, there should be no ‘new’)”
| eval comment=”need to get the counts into variables for the equation”
| stats count AS stots BY status | fields stots | mvcombine delim="," stots
| nomv stots | rex field=stots "(?<tot_fix>.),(?<tot_persist>.)"

| eval comment=”now do the calculation”
| eval progress=(tot_fix / (tot_fix + tot_persist)) * 100
| table progress

I also tried an approach that put the last_scan_finished per asset in a lookup table, avoiding the redundant search, but that did not speed it up much (and would introduce the need to keep updating a lookup table).

Ideas?

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm not going to attempt to follow a wall of search without knowing what your data looks like or what the job inspector output is, so here are some general pointers after skimming over:

View solution in original post

elliotproebstel
Champion

I'm adding this as an answer because I can't add a screenshot to a comment - but this is in response to your thought that you couldn't configure a lookup without file access to transforms.conf. You actually can achieve the same thing through the UI. Go to Settings > Lookups, and that will take you to this menu:
alt text

Under Add New, you'll find the settings page to configure your existing CSV as a full lookup, allowing you to use the lookup command.

0 Karma

elliotproebstel
Champion

Here's the screenshot again:

alt text

0 Karma

MuS
SplunkTrust
SplunkTrust

Thanks for uploading the screenshot again!

0 Karma

MuS
SplunkTrust
SplunkTrust

@elliotproebstel, sorry I accidentally deleted the screenshot while trying to make it show 😞

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I'm not going to attempt to follow a wall of search without knowing what your data looks like or what the job inspector output is, so here are some general pointers after skimming over:

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Yeah, lookup doesn't filter on its own, but you can filter based on its output fields.

Additionally, if you extract your cve field as multivalue right away, you can automatically apply the lookup and filter based on a lookup output field in the generating search - before even the mvexpand.

Alternatively, if your lookup is a strong restriction (only few events match the lookup), you could consider this pattern: sourcetype=alpha [inputlookup PriorityCVE.csv | rename CVE as cve]
That will only load events that have a matching cve value, also requires extracting the cve field as multivalue right away.

0 Karma

claatu
Explorer

I believe lookup does not filter out events, unless you use a where afterwards. And it is only for the lookup table (not the subsearch), and the table is small, so I think not much gain there.

I was able to eliminate the append search, which greatly sped the thing up, and that is somewhat related to the 129424 answer linked. I restricted fields before the mvexpand. I also changed the end calculation to account for possible zeroes (lack of a status type).

Here is the latest (final?) version with some abbreviation.

sourcetype=alpha | fields _time cve asset_id

| makemv delim=";" cve | mvexpand cve
| rename cve AS CVE | join CVE [|inputlookup PriorityCVE.csv]
| convert timeformat="%F" ctime(_time) AS CurScan
| dedup asset_id CVE CurScan
| join asset_id, CurScan type=outer [search sourcetype=beta|... obtain last_scan_finished date AS CurScan
| eval vulnstat="nowtime" | fields asset_id CurScan vulnstat]
| eval ReportKey=if(isnull(vulnstat),"alltime","nowtime")
| table asset_id CVE CurScan ReportKey
| stats values(ReportKey) as ReportKey, values(CVE) AS CVEs by asset_id CVE
| eval status=case(mvcount(ReportKey)=2, "Persistent", ReportKey="alltime", "Fixed", true(), "New")
| stats count(eval(status="Fixed")) AS tFixed, count(eval(status="New")) AS tNew, count(eval(status="Persistent")) AS tPersist
| eval progress=(tFixed / (tFixed + tNew + tPersist)) * 100
| table progress

martin_mueller
SplunkTrust
SplunkTrust

The macro takes zero computational effort per event because it's removed before the search is launched.
The macro can be placed virtually anywhere, not just between commands.
The macro is the documented approach, easing transition between environments or users.

0 Karma

claatu
Explorer

I just added the eval comment thing for this posting, I don't actually have it in the real search. But will keep the macro thing in mind for future posts.

When I read about "lookup", it said the table name had to be in transforms.conf, which I don't have access to.

Will take a look at the link about multiple sourcetypes, but I have read plenty of search-compare type articles and still haven't found the magic.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

or

| rename COMMENT as "since COMMENT doesn't exist, this takes almost no time"
0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!