Getting Data In

How to validate or prove that disabling THP improves performance?

Contributor

Hi,

We are planning to disable Transparent Huge Pages (THP) on our Splunk Cloud Indexer. But the issue is how to validate/prove that it actually improves performance.

I have tried it in our lower env and captured the value of the following things before and after disabling THP.
1. List of processes using THP
2. Capturing CPU usage.
3. Run time of a huge search.

But the values of the above things are vary due to load variance also.

Kindly advise.

0 Karma
1 Solution

Path Finder

When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.

You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.

View solution in original post

Motivator
0 Karma

Splunk Employee
Splunk Employee

@kamal_jagga - Did one of the answers below help provide a solution your question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.

0 Karma

Super Champion

As you may have checked on Splunk docs - "On systems with THP enabled, Splunk has observed a minimum of a 30% degradation in indexing and search performance, with a similar percentage increase in latency. For this reason, Splunk recommends that you disable THP in your Linux system configuration unless that system runs an application that requires THP".

some more regarding splunk performance -
http://blogs.splunk.com/2014/05/07/splunk-sizing-and-performance-doing-more-with-more/

maybe, you can check this app -
SplunkIt is a performance benchmark kit designed to provide a simplified set of performance measurements for Splunk. If you are a Splunk administrator that wants to perform a straightforward benchmark of your hardware setup, this is the tool for you.

https://splunkbase.splunk.com/app/749/

0 Karma

Path Finder

When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.

You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.

View solution in original post

Contributor

Hi,

I tried this in our lower env and it seems that the results can be used to prove the benefits of disabling THP. However, the volume of the data was too low to share the specific results.

We are in the process of getting approvals to implement it in production.Will share more details from production.

Thanks for the help !!!

0 Karma

Motivator

Determine if THP is enabled or off (look for effective state = ok)

| rest /services/server/sysinfo
| table transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state
0 Karma

Explorer

If you look at this link ( http://docs.splunk.com/Documentation/Splunk/6.5.1/releasenotes/splunkandthp ) then you can see that you will have to watch the latency and basic performance of indexing and search performance.

You can also follow this walkthrough to disable the THP: https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!