Knowledge Management

SmartStore - Data not being frozen/deleted

rbal_splunk
Splunk Employee
Splunk Employee

Issue:

We have a SmartStore deployment and are seeing continued steady growth in S3 space. It appears the data in SmartStore is not accepting the "frozenTimePeriodInSecs" we have specified on an index basis. I have been informed others are seeing this issue as well and I have been asked to open a case.

Labels (1)
Tags (1)
0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

For Smartstore the Cluster Master has component CMMasterRemoteStorageThread which runs every remote_storage_retention_period (defaults to 15 minutes) to check if there are buckets in remote storage that needs to be frozen in the cluster

i)Component CMMasterRemoteStorageThread runs a search on all peers to retrieve the list of remote indexes with frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB information. This is also tracked in splunkd.log

05-21-2019 02:24:00.292 +0000 INFO  CMMasterRemoteStorageThread - retrieving remote indexes info with search=| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB

So the rest call that gets list of indexes is

| rest services/data/indexes datatype=all f=title f=frozenTimePeriodInSecs f=maxGlobalDataSizeMB f=remotePath f=disabled| search remotePath!="" AND disabled!=1| dedup title| fields title frozenTimePeriodInSecs maxGlobalDataSizeMB

ii)It then runs a search on all the peers to retrieve the list of the warm buckets that need to be frozen based on the frozenTimePeriodInSecs, maxGlobalDataSizeMB and maxGlobalRawDataSizeMB thresholds. The splunk.log has entry like

05-21-2019 02:24:00.846 +0000 INFO  CMMasterRemoteStorageThread - Will initiate retrieving the list of buckets to be frozen for remote storage retention for index=_internal with frozenTimePeriodInSecs=2592000 and maxGlobalDataSizeMB=0
05-21-2019 02:24:00.846 +0000 INFO  CMMasterRemoteStorageThread - retrieving the list of buckets to be frozen for remote storage retention for index=_internal with search=| dbinspect index=_internal cached=true timeformat=%s| search state=warm OR state=cold| search modTime != 1| stats max(endEpoch) AS endEpoch BY bucketId| sort -endEpoch| search endEpoch<1555813440| fields bucketId, endEpoch

So here is the |dbinspect search to retrieve the bucket to be frozen (slightly modified version from the splunkd.log in (ii)

| dbinspect index=* | join index [|rest /services/data/indexes| eval index=title | table index frozenTimePeriodInSecs ] 
| eval toNow=now()-endEpoch | convert num(toNow) | convert num(frozenTimePeriodInSecs)
| convert ctime(endEpoch) AS endEvent | convert ctime(startEpoch) AS startEvent 
| eval shouldBeFrozen=if( ( state!="hot"  AND state!="thawed" ) AND toNow>frozenTimePeriodInSecs,"yes","no") 
| table splunk_server index path id state startEvent endEvent shouldBeFrozen toNow frozenTimePeriodInSecs

To debug the issue where bucket is not being deleted basked on retention suggestion would be to check if (i) and (ii) search is returning list of indexes and list of the bucket to be frozen.

Based on my experience that if the Monitoring console is enabled on the Cluster Master that change the defaukt search group in distsearch.conf and that could casue these seraches to not return teh expected result.

Check if the Monitor Console is enabeld on CM.

nickhills
Ultra Champion

hi @rbal_splunk That is very interesting.. I think I have experienced this too. In that case the CM was also the DMC.

Are you saying that when using SS you should avoid DMC on CM, or is this being investigated as a bug? If so do you have the Ticket number?

If my comment helps, please give it a thumbs up!
0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

Yes. avoiding DMC(Monitor Console) on CM is an easy way to prevent this issue.

Using some hacking is may also be possible to prevent the issue ieven if you have no other option but to have MC on CM.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...