<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Search head pooling performance with NFS in Monitoring Splunk</title>
    <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98250#M1179</link>
    <description>&lt;P&gt;Hi ewoo, thanks a lot for taking some time to look into the issue. I finally got around to try your suggestion, it did not solve the problem yet. I updated the original post with   lists of functioncalls that are made before the reads. Could you have another look at it?&lt;/P&gt;</description>
    <pubDate>Sat, 12 Nov 2011 14:21:03 GMT</pubDate>
    <dc:creator>chris</dc:creator>
    <dc:date>2011-11-12T14:21:03Z</dc:date>
    <item>
      <title>Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98248#M1177</link>
      <description>&lt;P&gt;We are trying to use search head pooling (on Solaris 10/x86) with an NFS share but we have performance issues (Splunk Web is pretty slow, searching is very very slow). &lt;/P&gt;

&lt;P&gt;We see that splunk does a lot of open syscalls with dtrace:&lt;BR /&gt;&lt;BR /&gt;
&lt;CODE&gt;dtrace -n 'syscall::open*:entry /execname == "splunkd"/ {@[copyinstr(arg0)] = count(); } tick-1sec { printa(@); clear(@); }'&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;A lot of the opens are because of the csv files in [path_to_shared_storage]/var/run/splunk/dispatch/[job xy] directories. It looks like splunk is causing a open syscall once per second for those files. I tried changeing the &lt;CODE&gt;poll.interval.rebuild&lt;/CODE&gt; and &lt;CODE&gt;poll.interval.check&lt;/CODE&gt; settings in the &lt;CODE&gt;[pooling]&lt;/CODE&gt; section of &lt;CODE&gt;server.conf&lt;/CODE&gt;. But that did not seem to have an impact.&lt;/P&gt;

&lt;P&gt;We have over 1000 directories/jobs below &lt;PATH_TO_SHARED_STORAGE&gt;/var/run/splunk/dispatch most of the ttls for the jobs are 600 (60%) &amp;amp; 120 (20%) seconds some are 86400 (1 day). Should we somehow limit the amount of jobs (have multiple sets of search head pools)?&lt;/PATH_TO_SHARED_STORAGE&gt;&lt;/P&gt;

&lt;P&gt;Thanks Chris&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Update:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;
I added &lt;CODE&gt;dispatch_dir_cleanup_freq = 600&lt;/CODE&gt; in &lt;CODE&gt;limits.conf&lt;/CODE&gt; as ewoo suggested, this did not have a noticeable impact.&lt;/P&gt;

&lt;P&gt;Using another dtrace command: &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;dtrace -n 'syscall::open*:entry /execname == "splunkd"/  {@[copyinstr(arg0), ustack()] = count(); } tick-3sec { printa(@); clear(@); }'&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;one can see what function calls lead to the open syscalls:&lt;BR /&gt;
&lt;CODE&gt;&lt;BR /&gt;
/mnt/splunkshead/var/run/splunk/dispatch/1321015664.61171/metadata.csv&lt;BR /&gt;
              libc.so.1`__open+0xa&lt;BR /&gt;
              libc.so.1`open+0x10c&lt;BR /&gt;
              libc.so.1`_endopen+0xcc&lt;BR /&gt;
              libc.so.1`fopen+0x35&lt;BR /&gt;
              splunkd`_ZNK8Pathname5fopenEPKc+0x17&lt;BR /&gt;
              splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9&lt;BR /&gt;
              splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118&lt;BR /&gt;
              splunkd`_ZN22DispatchSearchMetadataC1ERK8PathnameRK4UserRK3Strm+0x2a8&lt;BR /&gt;
              splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x5d6f&lt;BR /&gt;
              splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4&lt;BR /&gt;
              splunkd`_ZN9EventLoop3runEv+0xbc&lt;BR /&gt;
              splunkd`_ZN14DispatchReaper4mainEv+0x553&lt;BR /&gt;
              splunkd`_ZN6Thread8callMainEPv+0x62&lt;BR /&gt;
              libc.so.1`_thr_setup+0x5b&lt;BR /&gt;
              libc.so.1`_lwp_start&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;/mnt/splunkshead/var/run/splunk/dispatch/1321017866.61929/info.csv&lt;BR /&gt;
              libc.so.1`__open+0xa&lt;BR /&gt;
              libc.so.1`open+0x10c&lt;BR /&gt;
              libc.so.1`_endopen+0xcc&lt;BR /&gt;
              libc.so.1`fopen+0x35&lt;BR /&gt;
              splunkd`_ZNK8Pathname5fopenEPKc+0x17&lt;BR /&gt;
              splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9&lt;BR /&gt;
              splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118&lt;BR /&gt;
              splunkd`_ZN17SearchResultsInfo7fromCSVERK8Pathnamebb+0x4f&lt;BR /&gt;
              splunkd`_ZN14DispatchSearch16getSearchJobInfoERK3StrR17SearchResultsInfo+0x4c&lt;BR /&gt;
              splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x1b2&lt;BR /&gt;
              splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4&lt;BR /&gt;
              splunkd`_ZN9EventLoop3runEv+0xbc&lt;BR /&gt;
              splunkd`_ZN14DispatchReaper4mainEv+0x553&lt;BR /&gt;
              splunkd`_ZN6Thread8callMainEPv+0x62&lt;BR /&gt;
              libc.so.1`_thr_setup+0x5b&lt;BR /&gt;
              libc.so.1`_lwp_start&lt;/P&gt;

&lt;P&gt;/mnt/splunkshead/var/run/splunk/dispatch/1321017866.61929/metadata.csv&lt;BR /&gt;
              libc.so.1`__open+0xa&lt;BR /&gt;
              libc.so.1`open+0x10c&lt;BR /&gt;
              libc.so.1`_endopen+0xcc&lt;BR /&gt;
              libc.so.1`fopen+0x35&lt;BR /&gt;
              splunkd`_ZNK8Pathname5fopenEPKc+0x17&lt;BR /&gt;
              splunkd`_ZN14ScopedFileStarC1ERK8PathnamePKc+0x9&lt;BR /&gt;
              splunkd`_ZN13SearchResults7fromCSVERK8Pathnamemmb+0x118&lt;BR /&gt;
              splunkd`_ZN22DispatchSearchMetadataC1ERK8PathnameRK4UserRK3Strm+0x2a8&lt;BR /&gt;
              splunkd`_ZN23DispatchDirectoryReaper4tickEv+0x5d6f&lt;BR /&gt;
              splunkd`_ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval+0xb4&lt;BR /&gt;
              splunkd`_ZN9EventLoop3runEv+0xbc&lt;BR /&gt;
              splunkd`_ZN14DispatchReaper4mainEv+0x553&lt;BR /&gt;
              splunkd`_ZN6Thread8callMainEPv+0x62&lt;BR /&gt;
              libc.so.1`_thr_setup+0x5b&lt;BR /&gt;
&lt;/P&gt;

&lt;P&gt;Does this help any of the splunk guys (ewoo) to narrow down the issue?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 10:01:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98248#M1177</guid>
      <dc:creator>chris</dc:creator>
      <dc:date>2020-09-28T10:01:22Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98249#M1178</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;It looks like splunk is causing a open&lt;BR /&gt;
syscall once per second for those&lt;BR /&gt;
files.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;The periodic opens on dispatch directories, especially in the absence of any search load, are likely caused by the reaper process that checks for jobs with expired TTLs.&lt;/P&gt;

&lt;P&gt;By default, the reaper triggers every 30 seconds. You can make it trigger less often by adding this to your limits.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[search]
# Reap every 10 minutes instead of every 30 seconds.
dispatch_dir_cleanup_freq = 600
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I suppose it is possible that the reaper is generating so much load that it is starving search processes of I/O to the NFS mount, though that wouldn't be my first guess for a root cause here.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;I tried changeing the&lt;BR /&gt;
poll.interval.rebuild and&lt;BR /&gt;
poll.interval.check settings in the&lt;BR /&gt;
[pooling] section of server.conf. But&lt;BR /&gt;
that did not seem to have an impact.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I wouldn't expect these settings to have much effect on search performance, as search in a separate process must always read all conf files anew.&lt;/P&gt;

&lt;P&gt;poll.interval.check controls how often Splunk checks conf files for updates. It doesn't affect how often Splunk touches dispatch directories.&lt;/P&gt;

&lt;P&gt;poll.interval.rebuild does not affect disk I/O at all. It controls how often Splunk rebuilds in-memory conf datastructures.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2011 17:48:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98249#M1178</guid>
      <dc:creator>ewoo</dc:creator>
      <dc:date>2011-10-28T17:48:13Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98250#M1179</link>
      <description>&lt;P&gt;Hi ewoo, thanks a lot for taking some time to look into the issue. I finally got around to try your suggestion, it did not solve the problem yet. I updated the original post with   lists of functioncalls that are made before the reads. Could you have another look at it?&lt;/P&gt;</description>
      <pubDate>Sat, 12 Nov 2011 14:21:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98250#M1179</guid>
      <dc:creator>chris</dc:creator>
      <dc:date>2011-11-12T14:21:03Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98251#M1180</link>
      <description>&lt;P&gt;Those stacktraces -- DispatchDirectoryReaper::tick() -- corrrespond to the reaper I mentioned. Does turning down the reaper frequency at least reduce the frequency/number of syscalls you observe?&lt;/P&gt;

&lt;P&gt;Also, I do still think the number of syscalls here might be a red herring re: your overall performance issues. Have you run an I/O benchmark (e.g. bonnie++) to see what kind of performance you get on your NFS clients?&lt;/P&gt;</description>
      <pubDate>Mon, 14 Nov 2011 20:29:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98251#M1180</guid>
      <dc:creator>ewoo</dc:creator>
      <dc:date>2011-11-14T20:29:16Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98252#M1181</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;Have you run an I/O benchmark (e.g. bonnie++) to see&lt;BR /&gt;
what kind of performance you get on your NFS clients?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;One "gotcha" I've observed with NFS performance is that some NFS configurations produce unusually poor create/delete performance (relative to overall read/write), which significantly impacts search.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Nov 2011 20:33:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98252#M1181</guid>
      <dc:creator>ewoo</dc:creator>
      <dc:date>2011-11-14T20:33:04Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98253#M1182</link>
      <description>&lt;P&gt;We were having the same performance problems with SH pooling and NFS. &lt;/P&gt;

&lt;P&gt;We had to change the sysctl parameter "&lt;CODE&gt;sunrpc.tcp_slot_table_entries&lt;/CODE&gt;" from its default value of 16, to a more aggressive value of 128. This parameter controls how many RPC requests can be in-flight at once. Splunk does make a large number of NFS requests, so many that they were queuing up instead of being serviced in real-time. Increasing this value may cause the client to use many more NFSD threads on the server. &lt;/P&gt;

&lt;P&gt;A quick google search tells me that a similar setting on Solaris might be "&lt;CODE&gt;nfs3_max_threads&lt;/CODE&gt;".&lt;/P&gt;

&lt;P&gt;Other tweaks include the following mount options for the NFS volume: &lt;CODE&gt;noatime,actimeo=60&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Good luck with your setup!&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2012 17:10:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98253#M1182</guid>
      <dc:creator>jacobwilkins</dc:creator>
      <dc:date>2012-02-10T17:10:39Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98254#M1183</link>
      <description>&lt;P&gt;Thanks for the hints, we've. I'll have a go at that next week when I get back to the office.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Feb 2012 15:26:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98254#M1183</guid>
      <dc:creator>chris</dc:creator>
      <dc:date>2012-02-11T15:26:15Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98255#M1184</link>
      <description>&lt;P&gt;Chris,&lt;/P&gt;

&lt;P&gt;What´s the result of it all ? I´m also experiencing huge NFS performance issues.&lt;/P&gt;

&lt;P&gt;Tried to use mounted bundles that did solved the jobs exiting with 255 &lt;BR /&gt;
  errors, but search heads are still timing out.&lt;/P&gt;

&lt;P&gt;We´re using RHES latest.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Aug 2014 10:59:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98255#M1184</guid>
      <dc:creator>theunf</dc:creator>
      <dc:date>2014-08-28T10:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98256#M1185</link>
      <description>&lt;P&gt;Hi theunf, we never solved this problem. We had a remote meeting with a Splunk engineer who recommended not using that set up. I have left that company since and seen other companies try and fail with sh pooling as a consultant. I think that a new pooling mechanism will be released in one of the next releases. At my current job we use different s-heads for different user groups. Maybe someone from Splunk can give an update on this...&lt;/P&gt;</description>
      <pubDate>Thu, 28 Aug 2014 13:14:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98256#M1185</guid>
      <dc:creator>chris</dc:creator>
      <dc:date>2014-08-28T13:14:52Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98257#M1186</link>
      <description>&lt;P&gt;theunf: Is your NFS infrastructure sufficiently robust, based on your performance needs? If you haven't already, please check out the "Performance analysis" section here: &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Searchheadpoolingconfigurationissues"&gt;http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Searchheadpoolingconfigurationissues&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Some common issues include:&lt;BR /&gt;
- latency to the NFS server is too high, e.g. search heads and NFS server are in different datacenters&lt;BR /&gt;
- concurreny settings on NFS clients are too low&lt;/P&gt;

&lt;P&gt;Does your NFS setup pass the validation "tests" described in the performance analysis documentation?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Aug 2014 19:15:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98257#M1186</guid>
      <dc:creator>ewoo</dc:creator>
      <dc:date>2014-08-28T19:15:50Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98258#M1187</link>
      <description>&lt;P&gt;ewoo,&lt;/P&gt;

&lt;P&gt;Unique ServerNames, NTP, Permissions, all 10Gbe interfaces, 1-2ms flat of rtt pings, no retransmission or error logs on NFS client/server.&lt;/P&gt;

&lt;P&gt;Changed ulimits soft to 102400 hard to 409600 and sysctl fs.file-max=65000000 .&lt;/P&gt;

&lt;P&gt;Added to fstab mount statement : _netdev,auto,async,noatime,rsize=1048576,wsize=1048576&lt;/P&gt;

&lt;P&gt;With that i run bonnie++ on a 128Gb of ram and could get more thant 1300 iops &lt;BR /&gt;
  but i´m not satisfied.&lt;/P&gt;

&lt;P&gt;Today I´ll test changing the NFS client concurrency limits :&lt;BR /&gt;
  $ cat /proc/sys/sunrpc/tcp_slot_table_entries&lt;BR /&gt;
  2 -&amp;gt; to 16&lt;BR /&gt;
  $ cat /proc/sys/sunrpc/tcp_max_slot_table_entries&lt;BR /&gt;
  65536 -&amp;gt; to 6553600&lt;/P&gt;

&lt;P&gt;I read a Dell article saying that they could reach 100000 iops using common &lt;BR /&gt;
  NFS daemon and 7200 rpm discs but that did not inspired me that much &lt;span class="lia-unicode-emoji" title=":confused_face:"&gt;😕&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Have you tried to run nfs share of a ramfs ? &lt;BR /&gt;
  (http askubuntu com / questions / 304165 / setting-up-ram-disk-nfs)&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:27:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98258#M1187</guid>
      <dc:creator>theunf</dc:creator>
      <dc:date>2020-09-28T17:27:20Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98259#M1188</link>
      <description>&lt;P&gt;2ms ping is not good. For NFS to perform acceptably, you typically need sub-1ms ping. Is your NFS server on the same LAN, in the same datacenter as your search heads?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2014 20:13:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98259#M1188</guid>
      <dc:creator>ewoo</dc:creator>
      <dc:date>2014-09-02T20:13:30Z</dc:date>
    </item>
    <item>
      <title>Re: Search head pooling performance with NFS</title>
      <link>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98260#M1189</link>
      <description>&lt;P&gt;ewoo &lt;/P&gt;

&lt;P&gt;I wrote 1-2ms just because once ina while I think I saw 2ms.&lt;/P&gt;

&lt;P&gt;This is a local 10Gbe LAN, all on the same site.&lt;/P&gt;

&lt;P&gt;Each Indexer is a HW with 128Gb of Ram and 32 newest CPUs with two 10Gbe interfaces joined as a bond interface on RHEL server 6.5&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2014 21:20:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Monitoring-Splunk/Search-head-pooling-performance-with-NFS/m-p/98260#M1189</guid>
      <dc:creator>theunf</dc:creator>
      <dc:date>2014-09-02T21:20:57Z</dc:date>
    </item>
  </channel>
</rss>

