In our Splunk installation, our indexes are using remotepath configured to use an in-house S3. We have had situations where S3 is unavailable for hours and sometimes days. During these periods our indexers our indexers become unstable. Some of the stability issues we have seen:
1. Forwards are unable to reach the indexers.
2. Indexers constantly restart
Are there any configuration settings(timeouts, retries,etc.) that we can apply make the environment more stable?
May be your requirement can be managed by making configuration changes such that Splunk will get into Automatic detention when the connectivity to smart store is lost, and that will enable search but no additional data will be indexed.
1) The settings in server.conf that initiate eviction based on occupancy of the cache's disk partition:
The max_cache_size setting specifies the maximum occupied space, in megabytes, for the disk partition that contains the cache.
The minFreeSpace setting specifies the minimum free space, in megabytes, for a partition.
The eviction_padding setting controls the amount of additional space, in megabytes, that the cache manager protects, beyond the minFreeSpace value.
When the occupied space on the cache's partition exceeds max_cache_size,
The partition's free space falls below (minFreeSpace +eviction_padding), the cache manager begins to evict data.
3) To protect recently indexed data from eviction, set cache retention periods based on data recency, use the hotlist_recency_secs and hotlist_bloom_filter_recency_hours
i)hotlist_recency_secs : causes the cache manager to protect buckets that contain recent data over other buckets. When the eviction is necessary, the cache manager will not evict buckets until their configured retention periods have passed, unless all other buckets have already been evicted.
ii)hotlist_bloom_filter_recency_hours: To protect metadata files, such as the bloomfilter file from evition.
The above setting can be implemented On a global level (across all indexes), to favor recently indexed data over recently used data. When configured globally it overrides the eviction policy. If hotlist_recency_secs is set globally to 604800 (7 days), the cache manager will attempt to retain buckets with data that is less than seven days old. It will instead evict older buckets, even if those older buckets were searched more recently. The cache manager will only evict buckets containing data less than seven days old if there are no older buckets to evict.
To configure the hotlist_recency_secs and hotlist_bloom_filter_recency_hours settings globally, for all SmartStore indexes, you must set them in the [cachemanager] stanza in server.conf.
During remote store connectivity issue, the bucket will not upload to remote store but will roll to Warm, and due to hotlist_recency_secs and hotlist_bloom_filter_recency_hours the recent data will also not get evicted and will be available for searching.. Due to failed eviction eventually, minFreeSpace will drop indexers will get into Automatic detention.