We have an issues with big amounts of IO waits alerts on Splunk indexers. After investigation I found there is no swap space used during all the time. Do you know how can I enable swap or swap file to be used by splunk indexer?
Service]
Type=simple
Restart=always
ExecStart=/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk
Group=splunk
Delegate=true
CPUShares=1024
MemoryLimit=32654905344
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/memory/system.slice/%n"
[Install]
WantedBy=multi-user.target
[root@splunk]# cat /proc/meminfo
MemTotal: 31889556 kB
MemFree: 1715036 kB
root@splunk ~]# free -m
total used free shared buff/cache available
Mem: 31142 5835 13411 1584 11895 23308
Swap: 0 0 0
Another possibility is the IOWait alerts are false, especially if you started seeing them after a Splunk upgrade. Consider disabling the alert.
Swap is a bad thing. You don't want to force swap because it means the app has to wait for its data to be moved from swap back to memory where it can be accessed.
Hi
just like @richgalloway said, using swap with splunk (or almost anything else) will kill the performance of host. Especially when you already have iowait issue it will goes worst with swap in use.
Do you have enough good (IOPS point of view) disks in your environment?
r. Ismo
weel, we have IOPs set to 16 000. I believe is enough. We have also 3 instanes of indexers and indexers are replicating each other. We also considered of the disks upgrade can help us. Currently we're using gp2 disks in AWS.
If you're getting IOWaits then perhaps 16000 is not enough, especially if it's shared by 3 indexers. Try increasing it.
currently there is set 16 000 IOPS, if we want more, we will need to change the type of disks from gp3 to IOPS provisioned. And it seems it take a lot of costs.
16k IOPS should be ok in most cases. So probably there are some other reason why there is IOWait. You must try to find and identify this reason first and then solve it.
There are lot of tools to see what happened in you OS/"HW" level based on your environment. You could use your distros standar tools like iostat, vmstat, sar, top etc. or use something else like http://nmon.sourceforge.net/pmwiki.php (I like this, as it's quite visual way to see quickly what is happening in your system).
r. Ismo
well, I tried to find the issue by nmon tool. I can see there is still a lot of reading operation on disks and the disk is quite utilized but I cannot fid which job(search) is doing it. I tried to identify it by admin inteface in Activity -> Jobs, but cannot see any search which could correlate with it. Also, the interesting thing that I can see only jobs which are done but not that which are currently running. Everytime when I'm trying to list running jobs I got no results.