Hi,
We are planning to disable Transparent Huge Pages (THP) on our Splunk Cloud Indexer. But the issue is how to validate/prove that it actually improves performance.
I have tried it in our lower env and captured the value of the following things before and after disabling THP.
1. List of processes using THP
2. Capturing CPU usage.
3. Run time of a huge search.
But the values of the above things are vary due to load variance also.
Kindly advise.
When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.
You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.
Lots of good comments here: https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html
@kamal_jagga - Did one of the answers below help provide a solution your question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.
As you may have checked on Splunk docs - "On systems with THP enabled, Splunk has observed a minimum of a 30% degradation in indexing and search performance, with a similar percentage increase in latency. For this reason, Splunk recommends that you disable THP in your Linux system configuration unless that system runs an application that requires THP".
some more regarding splunk performance -
http://blogs.splunk.com/2014/05/07/splunk-sizing-and-performance-doing-more-with-more/
maybe, you can check this app -
SplunkIt is a performance benchmark kit designed to provide a simplified set of performance measurements for Splunk. If you are a Splunk administrator that wants to perform a straightforward benchmark of your hardware setup, this is the tool for you.
https://splunkbase.splunk.com/app/749/
When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.
You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.
Hi,
I tried this in our lower env and it seems that the results can be used to prove the benefits of disabling THP. However, the volume of the data was too low to share the specific results.
We are in the process of getting approvals to implement it in production.Will share more details from production.
Thanks for the help !!!
Determine if THP is enabled or off (look for effective state = ok)
| rest /services/server/sysinfo
| table transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state
If you look at this link ( http://docs.splunk.com/Documentation/Splunk/6.5.1/releasenotes/splunkandthp ) then you can see that you will have to watch the latency and basic performance of indexing and search performance.
You can also follow this walkthrough to disable the THP: https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html