We have a SmartStore deployment and are seeing continued steady growth in S3 space. It appears the data in SmartStore is not accepting the "frozenTimePeriodInSecs" we have specified on an index basis. I have been informed others are seeing this issue as well and I have been asked to open a case.
For Smartstore the Cluster Master has component CMMasterRemoteStorageThread which runs every remote_storage_retention_period (defaults to 15 minutes) to check if there are buckets in remote storage that needs to be frozen in the cluster
i)Component CMMasterRemoteStorageThread runs a search on all peers to retrieve the list of remote indexes with frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB information. This is also tracked in splunkd.log
05-21-2019 02:24:00.292 +0000 INFO CMMasterRemoteStorageThread - retrieving remote indexes info with search=| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB
So the rest call that gets list of indexes is
| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB
ii)It then runs a search on all the peers to retrieve the list of the warm buckets that need to be frozen based on the frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB thresholds. The splunk.log has entry like
05-21-2019 02:24:00.846 +0000 INFO CMMasterRemoteStorageThread - Will initiate retrieving the list of buckets to be frozen for remote storage retention for index=_internal with frozenTimePeriodInSecs=2592000 and maxGlobalDataSizeMB=0
05-21-2019 02:24:00.846 +0000 INFO CMMasterRemoteStorageThread - retrieving the list of buckets to be frozen for remote storage retention for index=_internal with search=| dbinspect index=_internal cached=true timeformat=%s| search state=warm OR state=cold| search modTime != 1| stats max(endEpoch) AS endEpoch BY bucketId| sort -endEpoch| search endEpoch<1555813440| fields bucketId, endEpoch
So here is the |dbinspect search to retrieve the bucket to be frozen (slightly modified version from the splunkd.log in (ii)
| dbinspect index=* | join index [|rest /services/data/indexes| eval index=title | table index frozenTimePeriodInSecs ]
| eval toNow=now()-endEpoch | convert num(toNow) | convert num(frozenTimePeriodInSecs)
| convert ctime(endEpoch) AS endEvent | convert ctime(startEpoch) AS startEvent
| eval shouldBeFrozen=if( ( state!="hot" AND state!="thawed" ) AND toNow>frozenTimePeriodInSecs,"yes","no")
| table splunk_server index path id state startEvent endEvent shouldBeFrozen toNow frozenTimePeriodInSecs
To debug the issue where bucket is not being deleted basked on retention suggestion would be to check if (i) and (ii) search is returning list of indexes and list of the bucket to be frozen.
Based on my experience that if the Monitoring console is enabled on the Cluster Master that change the defaukt search group in distsearch.conf and that could casue these seraches to not return teh expected result.