We have a SmartStore deployment and are seeing continued steady growth in S3 space. It appears the data in SmartStore is not accepting the "frozenTimePeriodInSecs" we have specified on an index basis. I have been informed others are seeing this issue as well and I have been asked to open a case.
For Smartstore the Cluster Master has component CMMasterRemoteStorageThread which runs every remotestorageretention_period (defaults to 15 minutes) to check if there are buckets in remote storage that needs to be frozen in the cluster
i)Component CMMasterRemoteStorageThread runs a search on all peers to retrieve the list of remote indexes with frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB information. This is also tracked in splunkd.log
05-21-2019 02:24:00.292 +0000 INFO CMMasterRemoteStorageThread - retrieving remote indexes info with search=| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB
So the rest call that gets list of indexes is
| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB
ii)It then runs a search on all the peers to retrieve the list of the warm buckets that need to be frozen based on the frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB thresholds. The splunk.log has entry like
05-21-2019 02:24:00.846 +0000 INFO CMMasterRemoteStorageThread - Will initiate retrieving the list of buckets to be frozen for remote storage retention for index=_internal with frozenTimePeriodInSecs=2592000 and maxGlobalDataSizeMB=0 05-21-2019 02:24:00.846 +0000 INFO CMMasterRemoteStorageThread - retrieving the list of buckets to be frozen for remote storage retention for index=_internal with search=| dbinspect index=_internal cached=true timeformat=%s| search state=warm OR state=cold| search modTime != 1| stats max(endEpoch) AS endEpoch BY bucketId| sort -endEpoch| search endEpoch<1555813440| fields bucketId, endEpoch
So here is the |dbinspect search to retrieve the bucket to be frozen (slightly modified version from the splunkd.log in (ii)
| dbinspect index=* | join index [|rest /services/data/indexes| eval index=title | table index frozenTimePeriodInSecs ] | eval toNow=now()-endEpoch | convert num(toNow) | convert num(frozenTimePeriodInSecs) | convert ctime(endEpoch) AS endEvent | convert ctime(startEpoch) AS startEvent | eval shouldBeFrozen=if( ( state!="hot" AND state!="thawed" ) AND toNow>frozenTimePeriodInSecs,"yes","no") | table splunk_server index path id state startEvent endEvent shouldBeFrozen toNow frozenTimePeriodInSecs
To debug the issue where bucket is not being deleted basked on retention suggestion would be to check if (i) and (ii) search is returning list of indexes and list of the bucket to be frozen.
Based on my experience that if the Monitoring console is enabled on the Cluster Master that change the defaukt search group in distsearch.conf and that could casue these seraches to not return teh expected result.
Check if the Monitor Console is enabeld on CM.
hi @rbal_splunk That is very interesting.. I think I have experienced this too. In that case the CM was also the DMC.
Are you saying that when using SS you should avoid DMC on CM, or is this being investigated as a bug? If so do you have the Ticket number?
Yes. avoiding DMC(Monitor Console) on CM is an easy way to prevent this issue.
Using some hacking is may also be possible to prevent the issue ieven if you have no other option but to have MC on CM.