Splunk Enterprise Security

Editing Notable Events is running slow

martin_mueller
SplunkTrust
SplunkTrust

I'm experiencing quite slow executions of host:8000/custom/SA-ThreatIntelligence/notable_events/update_status when editing a notable event through the Incident Review view. Here's a Gantt chart of a slow-feeling execution:

alt text

Built based on notable_event_update_rest_handler.log and reviewstatuses_rest_handler.log. The biggest offenders are getCorrelationSearches, commentLengthRequired, isUrgencyOverrideAllowed, getStatusLabelMap and getDefaultStatus, the others are just calling those and inheriting their slowness.
Those all sound as if they're basically loading configuration - any hint on speeding that up or further debugging?

Splunk 6.1.4 on Windows 2008R2, running ES 3.1.1.

LukeMurphey
Champion

You are correct, the redundant calls should not be performed. Look to a future maintenance release of ES to improve performance (we actually have some other things we are looking into that should improve performance).

martin_mueller
SplunkTrust
SplunkTrust

3.2.1 released a few days ago still uses this extremely slow import. A minute of fiddling with a few .py files after updating takes https://localhost:8000/en-US/custom/SA-ThreatIntelligence/notable_info/all from 13.2s to 10s, for example. Still terrible, but it was 32% more terrible before the fiddling...

Also, it seems there are slow calls being made from the main UI thread synchronously, as complained about by my browser:

alt text

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

"Progress" is all I wanted to hear, thanks 🙂

alt text

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Hey Luke,

Any news about that maintenance release?

0 Karma

LukeMurphey
Champion

I can't be very specific but I can tell you that it should be soon. I actually reviewed the change-list for this particular problem myself. Sorry for having to be cagey about the dates.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

An example of optimization potential, the Incident Review view loads a list of Notable Event owners for the filter in the top left corner via a search... and calls a custom .py endpoint to load a list of Notable Event owners for the log review popup thingy. That'll also help declutter code redundancy... and make customization of that list easier.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Looking forward as hard as my eyes can take 😉

While you're at it, are there plans to reduce the looking-for-summaries overhead that occurs when launching a search? Apparently all saved searches visible in the current context are examined, taking 30ms-ish per saved search on a testing instance. With hundreds of globally visible saved searches in ES, that adds up to several seconds of overhead for every search. Here's a pastebin, look around line 146ish: http://pastebin.com/hg3AtJCe
It doesn't explicitly say "going through all visible saved searches", but I've reproduced that with a naked Splunk. No saved searches, 0.1s for | stats count - 75 saved searches, 0.5s - 150 saved searches, 1.0s. Turning them app-visible and running the search in a different app behaves as if the saved searches didn't exist, so the globally visible ones are the issue.

I fear making that faster underneath is a core Splunk issue rather than ES, so in the meantime - what would happen if one were to make the saved searches underneath a correlation search app-visible only? Didn't get around to test that on a different machine yet 😞

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Replacing this

from splunk.appserver.mrsparkle.lib.util import make_splunkhome_path

with this

#from splunk.appserver.mrsparkle.lib.util import make_splunkhome_path

def make_splunkhome_path(args):
    return os.path.join(os.getenv("SPLUNK_HOME"), *args)

(and importing os where necessary) in a dozen or two .py files in SA-ThreatIntelligence/bin has shaved eight seconds off the REST call for saving notable events.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Update: As documented in #splunk, I've dug a little deeper using commentLengthRequired as my guinea pig.

Underneath it seems to be calling | rest /services/alerts/log_review/comment (not through the search command of course), which is taking up the biggest chunk of the time. I've replicated this on my laptop with just the SA-ThreatIntelligence app enabled to turn off any other noise that might add time... still takes ages for such a simple config call.

Digging deeper into log_review_rest_handler.py, apparently two seconds each are spent doing who knows what before entering either the setup or handleList method:

127.0.0.1 - admin [06/Nov/2014:13:12:17.666 +0100] "GET /services/alerts/log_review/comment?count=0 HTTP/1.0" 200 4172 - - - 7813ms
2014-11-06 13:12:19,348 DEBUG In setup
2014-11-06 13:12:19,348 DEBUG Out setup
2014-11-06 13:12:21,598 DEBUG In handleList

You can see I'm calling the REST endpoint at 13:12:17, setup gets called two seconds later and finishes in no time. Another two seconds later handleList gets called - those four seconds are spent somewhere in between the various layers of Splunkiness, and happen on several of these custom REST calls. In total, they add up to most of the time I'm seeing the notable_events/update_status consume.