Solved: search with joins and append takes too long

claatu · ‎01-21-2018

Goal is to determine, from specific vulnerabilities found in scans, the percentage that have been ‘fixed’, meaning they are not found in the latest scans. So first we get the vulns for all time, then check the ones related to the latest scans. So alltime – latest = fixed. And fixed / alltime * 100 = percentage fixed.
The search/query takes way too long. How to speed it up?
Here is the approach.

| eval comment=”now do the calculation”
| eval progress=(tot_fix / (tot_fix + tot_persist)) * 100
| table progress

I also tried an approach that put the last_scan_finished per asset in a lookup table, avoiding the redundant search, but that did not speed it up much (and would introduce the need to keep updating a lookup table).

Ideas?

martin_mueller · ‎01-21-2018

I'm not going to attempt to follow a wall of search without knowing what your data looks like or what the job inspector output is, so here are some general pointers after skimming over:

use the comment macro instead of eval comment="...", should see a big drop in eval time spent in the job inspector. Read http://docs.splunk.com/Documentation/Splunk/7.0.1/Search/Addcommentstosearches
don't use | join field [inputlookup foo.csv], use | lookup foo.csv field
don't use sourcetype=A | ... | append [search sourcetype=A], you're just loading the same data twice. Read https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
restrict fields before mvexpand

View solution in original post

elliotproebstel · ‎01-22-2018

I'm adding this as an answer because I can't add a screenshot to a comment - but this is in response to your thought that you couldn't configure a lookup without file access to transforms.conf. You actually can achieve the same thing through the UI. Go to Settings > Lookups, and that will take you to this menu:

Under Add New, you'll find the settings page to configure your existing CSV as a full lookup, allowing you to use the lookup command.

elliotproebstel · ‎01-23-2018

Here's the screenshot again:

MuS · ‎01-23-2018

Thanks for uploading the screenshot again!

MuS · ‎01-23-2018

@elliotproebstel, sorry I accidentally deleted the screenshot while trying to make it show 😞

martin_mueller · ‎01-21-2018

I'm not going to attempt to follow a wall of search without knowing what your data looks like or what the job inspector output is, so here are some general pointers after skimming over:

use the comment macro instead of eval comment="...", should see a big drop in eval time spent in the job inspector. Read http://docs.splunk.com/Documentation/Splunk/7.0.1/Search/Addcommentstosearches
don't use | join field [inputlookup foo.csv], use | lookup foo.csv field
don't use sourcetype=A | ... | append [search sourcetype=A], you're just loading the same data twice. Read https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
restrict fields before mvexpand

martin_mueller · ‎01-25-2018

Yeah, lookup doesn't filter on its own, but you can filter based on its output fields.

Additionally, if you extract your cve field as multivalue right away, you can automatically apply the lookup and filter based on a lookup output field in the generating search - before even the mvexpand.

Alternatively, if your lookup is a strong restriction (only few events match the lookup), you could consider this pattern: sourcetype=alpha [inputlookup PriorityCVE.csv | rename CVE as cve]
That will only load events that have a matching cve value, also requires extracting the cve field as multivalue right away.

claatu · ‎01-23-2018

I believe lookup does not filter out events, unless you use a where afterwards. And it is only for the lookup table (not the subsearch), and the table is small, so I think not much gain there.

I was able to eliminate the append search, which greatly sped the thing up, and that is somewhat related to the 129424 answer linked. I restricted fields before the mvexpand. I also changed the end calculation to account for possible zeroes (lack of a status type).

Here is the latest (final?) version with some abbreviation.

martin_mueller · ‎01-21-2018

The macro takes zero computational effort per event because it's removed before the search is launched.
The macro can be placed virtually anywhere, not just between commands.
The macro is the documented approach, easing transition between environments or users.

claatu · ‎01-22-2018

I just added the eval comment thing for this posting, I don't actually have it in the real search. But will keep the macro thing in mind for future posts.

When I read about "lookup", it said the table name had to be in transforms.conf, which I don't have access to.

Will take a look at the link about multiple sourcetypes, but I have read plenty of search-compare type articles and still haven't found the magic.

DalJeanis · ‎01-21-2018

or

| rename COMMENT as "since COMMENT doesn't exist, this takes almost no time"

search with joins and append takes too long

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Are you a member of the Splunk Community?

search with joins and append takes too long

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever