Deployment Architecture

SmartStore cache manager not enforcing limit enforced by

rbal_splunk
Splunk Employee
Splunk Employee

Cluster indexer across the site is configured with Smartstore. Each indexer has 6TB partition that is utilized by $SPLUNK_HOME+$SPLUNK_DB


The Cache Manager is configured as below

$SPLUNK_HOME/etc/system/default/server.conf [diskUsage]
$SPLUNK_HOME/etc/system/default/server.conf minFreeSpace = 5000
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf [cachemanager]
$SPLUNK_HOME/etc/system/default/server.conf evict_on_stable = false
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf eviction_padding = 5120
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf eviction_policy = lru
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf hotlist_bloom_filter_recency_hours = 720
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf hotlist_recency_secs = 604800
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_cache_size = 4096000
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_concurrent_downloads = 8
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf max_concurrent_uploads = 8
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf remote.s3.multipart_max_connections = 4
$SPLUNK_HOME/etc/slave-apps/_cluster/local/server.conf remote.s3.multipart_upload.part_size = 536870912

The indexer is showing partition 6TB partition  97% utilized , although it should not have crossed 4TB based on  max_cache_size = 4096000

 Filesystem      1K-blocks       Used Available Use% Mounted ondevtmpfs         71967028          0  71967028   0% /devtmpfs            71990600          0  71990600   0% /dev/shmtmpfs            71990600    4219944  67770656   6% /runtmpfs            71990600          0  71990600   0% /sys/fs/cgroup/dev/nvme0n1p2   20959212    6812056  14147156  33% /none             71990600          0  71990600   0% /run/shm/dev/nvme1n1   6391527336 5864488560 204899848  97% /opt/splunktmpfs            14398120          0  14398120   0% /run/user/1003

Here is Debug entry for CacheManger

  • {06-10-2020 19:32:42.604 +0000 DEBUG CacheManager - The system has freebytes=210838605824 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000
  • 06-10-2020 19:32:42.607 +0000 DEBUG CacheManager - The system has freebytes=210838536192 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000
  • 06-10-2020 19:32:46.502 +0000 DEBUG CacheManager - The system has freebytes=210850021376 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000
  • 06-10-2020 19:32:46.505 +0000 DEBUG CacheManager - The system has freebytes=210850172928 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000
  • 06-10-2020 19:33:06.727 +0000 DEBUG CacheManager - The system has freebytes=210255511552 with minfreebytes=5242880000 cachereserve=5368709120 totalpadding=10611589120 buckets_size=3069799919616 maxSize=4294967296000

Note From DEBUG observation   :

  • freebytes=  210072649728 
  • minfreebytes=  5242880000
  • cachereserve=  5368709120
  • totalpadding=  10611589120
  • buckets_size=  3069785296896     <<<<<<    3TB As calculated by cacahemanager
  • maxSize=               4294967296000     <<<<<< configured 4TB limit

The issue is cache has almost utilized 6TB of disk space but as per the calculation it shows usage of 3TB.  Due to this miscalculation Splunk is not evicting the buckets.

 

 

 

 

Labels (1)
0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

 The computation for max_cache_size is broken across 7.2.6 to  8.0.4.

 As a workaround best will be enforced cache limit max_cache_size=0 and eviction_padding=<CONFIGURE_AS_PER_DESIRED_LIMIT>

For detail on JIRA contact Splunk Support

0 Karma

kranthimutyala
Explorer

HI @rbal_splunk  We are in the plan of implementing smart store  in our existing environment(non clustered indexer distributed environment).I have few queries reg this.

We have 15 indexers and each has 9TB of total disk space and Daily volume ingestion is ~5TB .

Please let me know how much cache size we need to reserve for 30days retention in smart store


We are afraid to put max_cache_size=0 since it would occupy the entire free space and would cause problems.

Additionally what if we decomm the servers and build new servers , how to link the old smart store data to new  indexers

Please suggest here . Thanks

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

current we have some issue in the calculation of the cache_size so recommendation won't be to set  max_cache_size=0 (to ignore it)

To manage the cache size use

https://docs.splunk.com/Documentation/Splunk/8.0.5/Admin/Serverconf

 

eviction_padding = <positive integer>
* Specifies the additional space, in megabytes, beyond 'minFreeSpace' that the
  cache manager uses as the threshold to start evicting data.
* If free space on a partition falls below
  ('minFreeSpace' + 'eviction_padding'), then the cache manager tries to evict
  data from remote storage enabled indexes.
* Default: 5120 (~5GB)

set this value to disk space you would like to keep empty.

0 Karma

rjfneto_pags
New Member

How can I see these Debug entry for CacheManger?

 

Thanks in advance

0 Karma

garyjohnson48
Explorer

Go to settings/server settings/server logging/ 

type in the search bar cachemanager 

click on cachmanager and change your logging level to debug. 

Search index=_internal component=cachemanager

 

This only works on the server that you just implemented it on. You can also change it in your $Splunk_Home/etc folder. 

0 Karma