Knowledge Management

[Smartstore]After migration to smart store Splunk Index Cluster Issues - bucket fixup pending for more than 48 hours

rbal_splunk
Splunk Employee
Splunk Employee

During the Migration from to SmartStore following issues were faced.

Issue 1: Many of the Bucket were stuck up in fixup queue in state >>Waiting target_wait_time before replication_bucket.
Recommended:
The cluster master won't put out build requests (Asking a peer to make a bucket searchable) if we're using S2 AND have use_batch_remote_rep_changes (Default is true).

I had to set use_batch_remote_rep_changes to false, and the cluster got to a stable all green state. It's a hidden attribute.

Issue 2: After the RF & SF was met and entire data searchable, disk storage on the homeopath has increased many folds. The Smart Store migration app has the configuration of

===server.conf=====
[cachemanager]
eviction_policy = noevict

The setting of “noevict” is recommended only during migration. At the end of the migration this need to be reverted back to default value of lru.

[cachemanager]
eviction_policy = lru

*Issue 3: It was found that max_size_kb=0, the setting means the cache manager is not configured. *

@Anonymous.conf:
[cachemanager]
max_size_kb=0

NOTE: max_cache_size in 7.2 is a replacement for max_size_kb in 7.1
To read more about it Refer :https://docs.splunk.com/Documentation/Splunk/7.1.4/Admin/Serverconf

Recommended following configuration:

4.1) [diskUsage]
minFreeSpace = 10% (disk Space on Partition)
To read more about it Refer :https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf

4.2) [cachemanager]
eviction_padding = 10% (disk Space on Partition) - in bytes
Refer: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf

It was confirmed that setting eviction_padding and max_size_kb are used in 7.1.2 and above.
max_cache_size in 7.2 is a replacement for max_size_kb in 7.1

4.3)max_size_kb = 75% of partition.
In Splunk Version 7.1 : For settings pertinent to cache size limits, we actually use different units (bytes, KB, and MB)
max_size_kb (KB)
minFreeSpace (MB)
eviction_padding (bytes)

If free space on a partition falls below ('minFreeSpace' + 'eviction_padding'), then the cache manager tries to evict data from remote storage enabled indexes.

*Note1: It was confirmed that setting eviction_padding and max_size_kb are used in 7.1.2 and above.In relase7.2 max_cache_size has replaed it.
*
Note_2: When up taking Splunk 7.2, please note that the "max_size_kb" and "eviction_padding" have been changed to MB

Issue 5: What are various options to check the size of the index.

I am still investigating this but here are some searches found – while investigating.

5,1) Total size of index files on each indexer (note the "cached=f"):

| dbinspect index=* cached=f |stats sum(sizeOnDiskMB) by splunk_server

5.2) Total size on remote:

| dbinspect index=* |stats sum(sizeOnDiskMB)

You can tweak index=* to use a different subset of indexes.
NOTE: This does not take into account RA and DMA summaries. If this is needed, then we will need a different strategy.

5.3) You can combine both and show both, in-cache and total data size:

| dbinspect index=data cached=f 
|rename sizeOnDiskMB as szMB_incache 
|join splunk_server bucketId 
[dbinspect index=data |rename sizeOnDiskMB as szMB_total] 
|table splunk_server bucketId szMB_total szMB_incache | addcoltotals |tail 1
Tags (1)

gjanders
SplunkTrust
SplunkTrust

Nice post, could you correct the typo's on "homeopath" and "replaed" with an edit, thanks!

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

If the splunk version is over 7.2.4.2 the setting of use_batch_remote_rep_changes is not required.

0 Karma

aakashbhalla1
Engager

@rbal_splunk This setting "use_batch_remote_rep_changes" where is it set? server.conf?

seegeekrun
Path Finder

The setting use_batch_remote_rep_changes needs to be in server.conf and placed outside of any of the stanzas. If it's in one of the stanzas, it'll be flagged as an invalid key.

So just put it at the top of the server.conf in etc/system/local/server.conf. At least that's where I put it and didn't receive any errors from btool

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...