Monitoring Splunk

Search head pooling performance with NFS

Motivator

We are trying to use search head pooling (on Solaris 10/x86) with an NFS share but we have performance issues (Splunk Web is pretty slow, searching is very very slow).

We see that splunk does a lot of open syscalls with dtrace:

dtrace -n 'syscall::open*:entry /execname == "splunkd"/ {@[copyinstr(arg0)] = count(); } tick-1sec { printa(@); clear(@); }'

A lot of the opens are because of the csv files in [pathtoshared_storage]/var/run/splunk/dispatch/[job xy] directories. It looks like splunk is causing a open syscall once per second for those files. I tried changeing the poll.interval.rebuild and poll.interval.check settings in the [pooling] section of server.conf. But that did not seem to have an impact.

We have over 1000 directories/jobs below /var/run/splunk/dispatch most of the ttls for the jobs are 600 (60%) & 120 (20%) seconds some are 86400 (1 day). Should we somehow limit the amount of jobs (have multiple sets of search head pools)?

Thanks Chris

Update:

I added dispatchdircleanup_freq = 600 in limits.conf as ewoo suggested, this did not have a noticeable impact.

Using another dtrace command:

dtrace -n 'syscall::open*:entry /execname == "splunkd"/ {@[copyinstr(arg0), ustack()] = count(); } tick-3sec { printa(@); clear(@); }'

one can see what function calls lead to the open syscalls:

/mnt/splunkshead/var/run/splunk/dispatch/1321015664.61171/metadata.csv
libc.so.1`__open+0xa
libc.so.1`open+0x10c
libc.so.1`_endopen+0xcc
libc.so.1`fopen+0x35
splunkd`_ZNK8Pathname5fopenEPKc+0x17
splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9
splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118
splunkd`_ZN22DispatchSearchMetadataC1ERK8PathnameRK4UserRK3Strm+0x2a8
splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x5d6f
splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4
splunkd`_ZN9EventLoop3runEv+0xbc
splunkd`_ZN14DispatchReaper4mainEv+0x553
splunkd`_ZN6Thread8callMainEPv+0x62
libc.so.1`_thr_setup+0x5b
libc.so.1`_lwp_start

/mnt/splunkshead/var/run/splunk/dispatch/1321017866.61929/info.csv
libc.so.1`__open+0xa
libc.so.1`open+0x10c
libc.so.1`_endopen+0xcc
libc.so.1`fopen+0x35
splunkd`_ZNK8Pathname5fopenEPKc+0x17
splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9
splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118
splunkd`_ZN17SearchResultsInfo7fromCSVERK8Pathnamebb+0x4f
splunkd`_ZN14DispatchSearch16getSearchJobInfoERK3StrR17SearchResultsInfo+0x4c
splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x1b2
splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4
splunkd`_ZN9EventLoop3runEv+0xbc
splunkd`_ZN14DispatchReaper4mainEv+0x553
splunkd`_ZN6Thread8callMainEPv+0x62
libc.so.1`_thr_setup+0x5b
libc.so.1`_lwp_start

/mnt/splunkshead/var/run/splunk/dispatch/1321017866.61929/metadata.csv
libc.so.1`__open+0xa
libc.so.1`open+0x10c
libc.so.1`_endopen+0xcc
libc.so.1`fopen+0x35
splunkd`_ZNK8Pathname5fopenEPKc+0x17
splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9
splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118
splunkd`_ZN22DispatchSearchMetadataC1ERK8PathnameRK4UserRK3Strm+0x2a8
splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x5d6f
splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4
splunkd`_ZN9EventLoop3runEv+0xbc
splunkd`_ZN14DispatchReaper4mainEv+0x553
splunkd`_ZN6Thread8callMainEPv+0x62
libc.so.1`_thr_setup+0x5b

Does this help any of the splunk guys (ewoo) to narrow down the issue?

Splunk Employee
Splunk Employee

2ms ping is not good. For NFS to perform acceptably, you typically need sub-1ms ping. Is your NFS server on the same LAN, in the same datacenter as your search heads?

0 Karma

Path Finder

ewoo,

Unique ServerNames, NTP, Permissions, all 10Gbe interfaces, 1-2ms flat of rtt pings, no retransmission or error logs on NFS client/server.

Changed ulimits soft to 102400 hard to 409600 and sysctl fs.file-max=65000000 .

Added to fstab mount statement : _netdev,auto,async,noatime,rsize=1048576,wsize=1048576

With that i run bonnie++ on a 128Gb of ram and could get more thant 1300 iops
but i´m not satisfied.

Today I´ll test changing the NFS client concurrency limits :
$ cat /proc/sys/sunrpc/tcpslottableentries
2 -> to 16
$ cat /proc/sys/sunrpc/tcp
maxslottable_entries
65536 -> to 6553600

I read a Dell article saying that they could reach 100000 iops using common
NFS daemon and 7200 rpm discs but that did not inspired me that much 😕

Have you tried to run nfs share of a ramfs ?
(http askubuntu com / questions / 304165 / setting-up-ram-disk-nfs)

0 Karma

Splunk Employee
Splunk Employee

theunf: Is your NFS infrastructure sufficiently robust, based on your performance needs? If you haven't already, please check out the "Performance analysis" section here: http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Searchheadpoolingconfigurationissues

Some common issues include:
- latency to the NFS server is too high, e.g. search heads and NFS server are in different datacenters
- concurreny settings on NFS clients are too low

Does your NFS setup pass the validation "tests" described in the performance analysis documentation?

0 Karma

Motivator

Hi theunf, we never solved this problem. We had a remote meeting with a Splunk engineer who recommended not using that set up. I have left that company since and seen other companies try and fail with sh pooling as a consultant. I think that a new pooling mechanism will be released in one of the next releases. At my current job we use different s-heads for different user groups. Maybe someone from Splunk can give an update on this...

0 Karma

Path Finder

Chris,

What´s the result of it all ? I´m also experiencing huge NFS performance issues.

Tried to use mounted bundles that did solved the jobs exiting with 255
errors, but search heads are still timing out.

We´re using RHES latest.

0 Karma

Communicator

We were having the same performance problems with SH pooling and NFS.

We had to change the sysctl parameter "sunrpc.tcp_slot_table_entries" from its default value of 16, to a more aggressive value of 128. This parameter controls how many RPC requests can be in-flight at once. Splunk does make a large number of NFS requests, so many that they were queuing up instead of being serviced in real-time. Increasing this value may cause the client to use many more NFSD threads on the server.

A quick google search tells me that a similar setting on Solaris might be "nfs3_max_threads".

Other tweaks include the following mount options for the NFS volume: noatime,actimeo=60

Good luck with your setup!

Motivator

Thanks for the hints, we've. I'll have a go at that next week when I get back to the office.

0 Karma

Splunk Employee
Splunk Employee

It looks like splunk is causing a open
syscall once per second for those
files.

The periodic opens on dispatch directories, especially in the absence of any search load, are likely caused by the reaper process that checks for jobs with expired TTLs.

By default, the reaper triggers every 30 seconds. You can make it trigger less often by adding this to your limits.conf:

[search]
# Reap every 10 minutes instead of every 30 seconds.
dispatch_dir_cleanup_freq = 600

I suppose it is possible that the reaper is generating so much load that it is starving search processes of I/O to the NFS mount, though that wouldn't be my first guess for a root cause here.

I tried changeing the
poll.interval.rebuild and
poll.interval.check settings in the
[pooling] section of server.conf. But
that did not seem to have an impact.

I wouldn't expect these settings to have much effect on search performance, as search in a separate process must always read all conf files anew.

poll.interval.check controls how often Splunk checks conf files for updates. It doesn't affect how often Splunk touches dispatch directories.

poll.interval.rebuild does not affect disk I/O at all. It controls how often Splunk rebuilds in-memory conf datastructures.

Path Finder

ewoo

I wrote 1-2ms just because once ina while I think I saw 2ms.

This is a local 10Gbe LAN, all on the same site.

Each Indexer is a HW with 128Gb of Ram and 32 newest CPUs with two 10Gbe interfaces joined as a bond interface on RHEL server 6.5

0 Karma

Splunk Employee
Splunk Employee

Have you run an I/O benchmark (e.g. bonnie++) to see
what kind of performance you get on your NFS clients?

One "gotcha" I've observed with NFS performance is that some NFS configurations produce unusually poor create/delete performance (relative to overall read/write), which significantly impacts search.

0 Karma

Splunk Employee
Splunk Employee

Those stacktraces -- DispatchDirectoryReaper::tick() -- corrrespond to the reaper I mentioned. Does turning down the reaper frequency at least reduce the frequency/number of syscalls you observe?

Also, I do still think the number of syscalls here might be a red herring re: your overall performance issues. Have you run an I/O benchmark (e.g. bonnie++) to see what kind of performance you get on your NFS clients?

0 Karma

Motivator

Hi ewoo, thanks a lot for taking some time to look into the issue. I finally got around to try your suggestion, it did not solve the problem yet. I updated the original post with lists of functioncalls that are made before the reads. Could you have another look at it?

0 Karma