Getting Data In

How to validate or prove that disabling THP improves performance?

kamal_jagga
Contributor

Hi,

We are planning to disable Transparent Huge Pages (THP) on our Splunk Cloud Indexer. But the issue is how to validate/prove that it actually improves performance.

I have tried it in our lower env and captured the value of the following things before and after disabling THP.
1. List of processes using THP
2. Capturing CPU usage.
3. Run time of a huge search.

But the values of the above things are vary due to load variance also.

Kindly advise.

0 Karma
1 Solution

SarahBOA
Path Finder

When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.

You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.

View solution in original post

wrangler2x
Motivator
0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

@kamal_jagga - Did one of the answers below help provide a solution your question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

As you may have checked on Splunk docs - "On systems with THP enabled, Splunk has observed a minimum of a 30% degradation in indexing and search performance, with a similar percentage increase in latency. For this reason, Splunk recommends that you disable THP in your Linux system configuration unless that system runs an application that requires THP".

some more regarding splunk performance -
http://blogs.splunk.com/2014/05/07/splunk-sizing-and-performance-doing-more-with-more/

maybe, you can check this app -
SplunkIt is a performance benchmark kit designed to provide a simplified set of performance measurements for Splunk. If you are a Splunk administrator that wants to perform a straightforward benchmark of your hardware setup, this is the tool for you.

https://splunkbase.splunk.com/app/749/

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

SarahBOA
Path Finder

When THP is enabled in our environment, we notice immediately (we are not in charge of our servers and this happens occasionally with OS patches etc). A few things that we watch/notice:
1. Our environment is very heavy on searches. If we timechart the CPU of all of our indexers in a line graph, they follow a very nice pattern - spikes on the 0, 15, 30, 45 minutes due to more searches running on those minutes. The lines on the graph are close together and look almost like someone has used a rainbow pen to draw a single line. When THP is enabled, there is variation to this and the graph just doesn't look the same - the lines for each indexer are distinct and don't necessarily follow the same pattern. On average, CPU is a bit higher, but it is really the timechart of the CPU graph that is telling.
2. We have some scheduled searches that run every minute or every 5 minutes. We timechart the avg() or perc90() of the run time of those searches and when THP is on, that will be higher by a noticeable amount. The timechart of the avg() allows us to account for the variations in load.

You can check for THP by looking at the internal logs and the ulimit parameter. Splunk only does this when it is starting up, so you will need to look at the ulimit after the THP has been turned off and splunk has restarted.

kamal_jagga
Contributor

Hi,

I tried this in our lower env and it seems that the results can be used to prove the benefits of disabling THP. However, the volume of the data was too low to share the specific results.

We are in the process of getting approvals to implement it in production.Will share more details from production.

Thanks for the help !!!

0 Karma

wrangler2x
Motivator

Determine if THP is enabled or off (look for effective state = ok)

| rest /services/server/sysinfo
| table transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state
0 Karma

viewpost_rgora
Explorer

If you look at this link ( http://docs.splunk.com/Documentation/Splunk/6.5.1/releasenotes/splunkandthp ) then you can see that you will have to watch the latency and basic performance of indexing and search performance.

You can also follow this walkthrough to disable the THP: https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...