Does anyone have figures of the performance impact of CVE-2017-5754
, CVE-2017-5753
and CVE-2017-5715
(Spectre/Meltdown) patches on Splunk?
Is Splunk planning to publish any official documentation pertaining to performance impacts associated to Spectre/Meltdown patches and/or provide any mitigation/remediation recommendations given how significant the impact might be? We have RHEL kernels at our organization and are deeply concerned with some of the reported performance impacts on this thread, so any information release would be highly appreciated.
Splunk cannot "mitigate" or "remediate" the impact of a kernel patches that, essentially, turn off all CPU optimizations. These flaws are inherent to the architecture of the CPU.
Your organization must determine, for your organization , if the potential risk is sufficient to warrant applying the patches. To perform risk analysis, you need to understand the nature of the risk, and then examine how likely those potential risks could occur in your environment.
Most organizations do not seem to bother with risk analysis of system vulnerabilities. Meltdown/Spectre is definitely a case where it should be performed.
I've heard you can expect 20-50 percent performance impact depending on search load if you're not running as root. This has much to do with the way the patch blocks user access to L3 cache.
hi @davpx - can you shed any more light on the "not running as root" part of your comment - I hadn't seen this noted previously and am looking for more info.
Adding a link to RedHat's Speculative Execution Exploit Performance Impacts - Describing the performance impacts to security patches for CVE-2017-5754 CVE-2017-5753 and CVE-2017-5715
That's the document that sent me looking for the exact figures - as far as I understand Splunk workloads are in "Modest" and "Measurable" impact categories,
Adding the Splunk Blog post from Feb 12, 2018.
I would assume it would have a significant impact, due to the high disk I/O and memory caching, but I'm no kernel engineer.
I plan to patch one of our indexers this week, and I hope to report my findings.
EDIT:
I have seen a significant impact on performance when the remediation is enabled. Redhat has a article how to disable it.
https://access.redhat.com/articles/3311301
I wrote a tuned profile for Splunk in which you can have it run a script do disable that remediation.
https://github.com/jewnix/tuned-splunk
I've asked the community on Slack/IRC to weigh in with concrete data pre-/post-mitigation.
Started a google drive where people can put in their data from testing.
Included prepatch data for part of our deployment. Anyone is free to deposit their own data
https://drive.google.com/drive/folders/1LegN7VuOA9y8VHY5D7XjARhvVpBaciz1?usp=sharing
Any updates yet? We are scheduled for patching, just wondering if anyone has some figures on indexer performance hits yet?
I saw around 50% increase in system load.
Some people in the industry I spoke to, do not recommend running production servers with the patch enabled. Especially on databases, file-servers or any I/O intensive load.
You can patch the system, and disable the feature by using tuned
or the equivalent for your platform.
we all about #facts and #proof here 😉
Lets see those pre and post patch signals y'all!