Monitoring Splunk

How do I disable Transparent Huge Pages (THP) and confirm that it is disabled?

jwelch_splunk
Splunk Employee
Splunk Employee

I have heard that THP can be problematic for certain applications. I would like to know how this can impact Splunk, and what I need to do about it?

1 Solution

jwelch_splunk
Splunk Employee
Splunk Employee

Some Linux distros have been shipping with THP enabled by default.

See the effects of this on the Splunk documentation here.

The Redhat info here explains 1 method of disabling THP (using grub.conf) as well as providing ways to validate they are disabled.

I like to follow this procedure:
(Each Sys Admin can come up with their own way to pull this off)

I run these two commands on all my systems that are running CentOS/Redhat 6.x or later that are splunk servers.

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

No need to restart splunk
Then to make these changes persistent across reboots I add this to the bottom of my /etc/rc.local

 #disable THP at boot time
if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then
      echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then
    echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
fi

I then validate I am not running things like ktune or tuned (this could actually override the settings you set above)

  chkconfig --list |grep tune
     ktune           0:off   1:off   2:off   3:off   4:off   5:off   6:off
     tuned           0:off   1:off   2:off   3:off   4:off   5:off   6:off

To validate THP is disabled, I run the below three commands, or any variant you choose from here .

cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
       always madvise [never]
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
       always madvise [never]
egrep 'trans|thp' /proc/vmstat  (on this command I validate none of these results are changing)
        nr_anon_transparent_hugepages 2
        thp_fault_alloc 12793
        thp_fault_fallback 18
        thp_collapse_alloc 70
        thp_collapse_alloc_failed 0
        thp_split 2974

One thing to keep in mind, Splunk will log in $SPLUNK_HOME/var/log/splunk/splunkd.log on start up if THP is enabled or disabled

    grep hugetables /opt/splunk/var/log/splunk/splunkd.log
   11-18-2014 08:19:42.052 -0600 INFO  ulimit - Linux transparent hugetables support, enabled="never" defrag="never"

A possible concern with this log entry can be two fold.

  • Because on my system /etc/rc.d/rc3.d/S90splunk is executed before /etc/rc.d/rc3.d/S99local after a reboot the splunkd.log entry will reflect they are enabled. However subsequent splunk restarts would reflect the proper information.

    • This check may not be aware of tuned/ktune running THP.

In summary:

  • Splunk does not have to be restarted after making these changes for the performance gain to be realized.
  • I prefer the rc.local over the grub.conf just because a future kernel upgrade might override my custom stanza.
  • Disabling THP will help improve Splunk performance and recommended to be disabled.

View solution in original post

ephemeric
Contributor

For RHEL7/CentOS7:

/etc/default/grub.conf
GRUB_CMDLINE_LINUX="... transparent_hugepage=never"

Make your GRUB config...

Or non-persistent whilst running:

yum install libhugetlbfs-utils.x86_64
hugeadm --thp-never

 

Tags (1)
0 Karma

rabbidroid
Path Finder

Most answers here are from long ago, so I will add this option here as well, that will work on more modern versions of RHEL based systems that run tuned version 2.11 and above.

The easiest way I found to disable THP is using tuned.

 

mkdir /etc/tuned/splunk_idx
cat << EOF > /etc/tuned/splunk_idx/tuned.conf
#
# tuned configuration
#

[main]
summary=Optimize for Splunk Indexer
description=Configures THP for better Splunk performance
include=latency-performance

[vm]
transparent_hugepages=never
transparent_hugepage.defrag=never
EOF

tuned-adm profile splunk_idx

 

I tested this on RHEL/CentOS version 7.8+ and it works fine.

0 Karma

joshualemoine
Path Finder

If your default profile is set to virtual-guest, I'm guessing you would just 

 

include=virtual-guest

 

...right?

 

0 Karma

skoelpin
SplunkTrust
SplunkTrust

To verify if THP is disabled, you can run this REST command in Splunk

| rest /services/server/sysinfo

Look under the column transparent_hugepages.effective_state

0 Karma

wrangler2x
Motivator

So that you don't have to scroll:

| rest /services/server/sysinfo | table transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state

wrangler2x
Motivator

Here is a search that you can turn into an alert. It might be handy if you are using grub to disable it, because that could be undone by a kernel upgrade/update.

| rest /services/server/sysinfo
| stats count by transparent_hugepages.effective_state transparent_hugepages.enabled transparent_hugepages.defrag
| rename transparent_hugepages.effective_state AS State, transparent_hugepages.enabled AS Enabled, transparent_hugepages.defrag AS defrag
| eval badstatef=if(State == "ok",tonumber("0"),tonumber("1"))
| eval enabledf=if(Enabled == "never",tonumber("0"),tonumber("1"))
| table State badstatef Enabled enabledf
| where badstatef > 0 OR enabledf > 0

I run this every morning and the Time Range fields in the alert are empty -- not needed on a REST search.

gesman_splunk
Splunk Employee
Splunk Employee
0 Karma

mcederhage_splu
Splunk Employee
Splunk Employee

Yet another way of making the change surviving reboots.
Tested on Ubuntu Server 16.04.2

It uses systemctl to create a service that is run once every boot.

Source: https://blacksaildivision.com/how-to-disable-transparent-huge-pages-on-centos

Check if THP is enabled, as per above

cat /sys/kernel/mm/transparent_hugepage/enabled

cat /sys/kernel/mm/transparent_hugepage/defrag

If [always] is within brackets it is enabled

Create the disable-thp.service file

sudo nano /etc/systemd/system/disable-thp.service

Paste the following and then save the file:

[Unit]
Description=Disable Transparent Huge Pages (THP)

[Service]
Type=simple
ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target

Reload the systemd daemon

sudo systemctl daemon-reload

Start the service script

sudo systemctl start disable-thp

Enable the service to start at boot

sudo systemctl enable disable-thp

Check that THP is now disabled

cat /sys/kernel/mm/transparent_hugepage/enabled

cat /sys/kernel/mm/transparent_hugepage/defrag

If [never] is within brackets it is now disabled

DUThibault
Contributor

Seems to work for CentOS 7 too.

0 Karma

romedome
Path Finder

This works very nicely!

0 Karma

stefan1988
Path Finder

Run the following query in Splunk:

| rest splunk_server=local /services/server/info 
| join type=outer splunk_server [rest splunk_server=local /services/server/sysinfo | fields splunk_server transparent_hugepages.*] 
| eval transparent_hugepages.effective_state = if(isnotnull('transparent_hugepages.effective_state'), 'transparent_hugepages.effective_state', "unknown") 
| eval transparent_hugepages.enabled = case(len('transparent_hugepages.enabled') > 0, 'transparent_hugepages.enabled', 'transparent_hugepages.effective_state' == "ok" AND (isnull('transparent_hugepages.enabled') OR len('transparent_hugepages.enabled') = 0), "feature not available", 'transparent_hugepages.effective_state' == "unknown" AND isnull('transparent_hugepages.enabled'), "unknown") 
| eval transparent_hugepages.defrag = case(len('transparent_hugepages.defrag') > 0, 'transparent_hugepages.defrag', 'transparent_hugepages.effective_state' == "ok" AND (isnull('transparent_hugepages.defrag') OR len('transparent_hugepages.defrag') = 0), "feature not available", 'transparent_hugepages.effective_state' == "unknown" AND isnull('transparent_hugepages.defrag'), "unknown") 
| eval severity_level = case('transparent_hugepages.effective_state' == "unavailable", -1, 'transparent_hugepages.effective_state' == "ok", 0, 'transparent_hugepages.effective_state' == "unknown", 1, 'transparent_hugepages.effective_state' == "bad", 2) 
| fields splunk_server transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state severity_level 
| rename splunk_server AS instance 
| fields - _timediff

koshyk
Super Champion

just to add to above messages, In large Enterprise Systems I would NOT do it per server; but rather from central Satellite (or orchestration server) to ensure all Splunk systems are consistent and is NOT overwritten.

eg To implement via puppet

exec { "disable_transparent_hugepage_enabled":
  command => "/bin/echo never > /sys/kernel/mm/transparent_hugepage/enabled",
  unless  => "/bin/grep -c '\[never\]' /sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null",
}

exec { "disable_transparent_hugepage_defrag":
  command => "/bin/echo never > /sys/kernel/mm/transparent_hugepage/defrag",
  unless  => "/bin/grep -c '\[never\]' /sys/kernel/mm/transparent_hugepage/defrag 2>/dev/null",
}
0 Karma

kamal_jagga
Contributor

Can anyone suggest any easy way to validate whether disabling THP improved the performance ?

0 Karma

cmeo
Contributor

Putting it in rc.local is too late. Using the legacy sysV init script which splunk enable boot-start deploys, it's already started by then.

What I do is modify /etc/init.d/splunk as follows:
Add function:

RETVAL=0 #existing

# disable hugepages

disable_huge() {
  echo "disabling huge page support"
  if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
     echo never > /sys/kernel/mm/transparent_hugepage/enabled
  fi
  if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
     echo never > /sys/kernel/mm/transparent_hugepage/defrag
  fi
}

Use it:

case "$1" in
  start)
    disable_huge
    splunk_start
    ;;
  stop)
    splunk_stop
    ;;
  restart)
    disable_huge
    splunk_restart
    ;;

Tried all the other things here and in places like stackoverflow, seems to be the only thing that actually works.
Yes we should all be using systemd methodology everywhere, but you know what? CBF learning yet another OS startup gadget, of which the woods seem to be full.

jwelch_splunk
Splunk Employee
Splunk Employee

As I mentioned in my original posting, THP can be disabled on the fly "without restarting splunk", and you are correct in normal SysV we would be disabling THP with rc.local after splunk has started, as I mentioned in the article as well. However it is only enabled for about 1 second before rc.local turns it back off.

Also using grub.conf is an option as well.

wrangler2x
Motivator

RHEL 7 the verification greps are different:

# cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never

# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

DUThibault
Contributor

Under CentOS 7, sadly, no /etc/init.d/splunk file.

0 Karma

vliggio
Communicator

Just run this and it creates the script:

/opt/splunk/bin/splunk enable boot-start

0 Karma

sloshburch
Splunk Employee
Splunk Employee

I'm pretty sure init.d is in regards to start up script and you would have to create such a file. The OS does not come with such a file out of the box. Not sure if that helps.

0 Karma

cmeo
Contributor

The grub.conf solution, as someone else already pointed out, can be overwritten by a kernel update.
What might help is a proper systemd-type service to do this as a dependency for splunk before it starts (and a lot of other software doesn't like THP either so this would be widely useful).
Alternatively some proper THP controls created upstream in the distros, but I'm not holding my breath. hugeadm in ubuntu is a step in the right direction but that doesn't work properly either as I discovered when I tried to use it. It's not sticky IIRC or perhaps it was some other problem--I don't recall. And it doesn't help you with any other distro anyway.
Yet another possiblity would be for splunk to actually address the problem and update the boot script apparatus. The ulimit stuff is out of date too. The only thing I've found that actually worked is here:
http://stackoverflow.com/questions/39506149/ubuntu-16-04-systemd-redis-issues-with-ulimit
and this is a lot of messing around to achieve what used to be easy but now isn't.
This still leaves you with a problem if someone runs $SPLUNK_HOME/bin/splunk start or restart without a wrapper script. All in all quite exasperating and I freely admit I've created an ugly hack.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...