I've read through the posts and cannot find an answer to this, forgive me if i missed a relevant post.
I'm specifically interested in how to use Smartstore (to an on-prem Scality S3 instance) for both Warm and Frozen data. Using some default index.conf settings, i've setup for my index to use Smartstore for warm data, what i'm struggling with is how to define the location for Frozen.
The documentation is good and seems to suggest setting only two options, frozenTimePeriodInSecs and maxGlobalDataSizeMB which I've done, but I cannot see where the Frozen data resides on my remote volume. Currently my index looks like this:
[default]
[volume:remote_store]
storageType = remote
path = s3://splunk-db
remote.s3.access_key = blah
remote.s3.secret_key = blah
remote.s3.list_objects_version = v1
remote.s3.signature_version = v2
remote.s3.auth_region =
remote.s3.supports_versioning = true
remote.s3.endpoint = http://IP_ADDRESS
[scality]
frozenTimePeriodInSecs = 86400
maxGlobalDataSizeMB = 100
maxHotBuckets = 3
archiver.enableDataArchive = 0
bucketRebuildMemoryHint = 0
compressRawdata = 1
enableDataIntegrityControl = 0
enableOnlineBucketRepair = 1
enableTsidxReduction = 0
maxDataSize = auto_high_volume
minHotIdleSecsBeforeForceRoll = 0
rtRouterQueueSize =
rtRouterThreads =
selfStorageThreads =
suspendHotRollByDeleteQuery = 0
syncMeta = 1
tsidxWritingLevel =
I've tried have a seperate stanza, and all under the [default], but in either case cannot identify what is the frozen data on the S3 server. Can anyone give me a hint?
Answering my own question with what I have found, hopefully to help others.
Since Splunk normally deletes data that is going to be frozen, I wanted to keep the data in the on-prem s3 store. To do this I found you need to specify the FrozenTime and a script that performs the work.
The cachemanager calls this script with a variable of the warm bucket that needs to be 'frozen'. In my very simple script, I simply copy that to an S3 target with aws cli. This process isn't designed for scalability, and I'm copying the whole bucket rather than tar/gz the contents. All of this can be done with a more exhaustive script.
In indexes.conf either in general or in the specific stanza you need the following:
[scality]
coldToFrozenScript = "/bin/bash" "/opt/splunk/bin/coldToFrozenS3.sh"
frozenTimePeriodInSecs = 86400
And the script it's calling is extremely simple, like the following:
#!/bin/bash
set -e
set -u
bucket=$1
warm=`echo $1 | cut -f9 -d"/"`
echo "bucket to move: " $bucket >> /var/log/splunkToS3.log
/usr/bin/aws --profile default --endpoint-url http://[IP]:[port] s3 mv $bucket s3://s3-bucket/scality/frozen/$warm --recursive 2>&1 >> /var/log/splunkToS3.log
SmartStore cannot store frozen data. You need to roll a custom solution to do frozen to s3.
Some tips on how to do so are here: https://answers.splunk.com/answers/293894/how-to-put-cold-and-frozen-data-on-s3-in-aws.html
All the best
Thanks for that, i'll look at the answer.
So what is happening to the data that meets the criteria or age or max size? Is it just being deleted?
Thanks for that. I'd read that earlier. I guess what's maybe not clear to me but obvious to Splunk aware folks is that "freezing the bucket" to me means putting into a state where it's locked and unused, but perhaps to everyone else that means deleted?
What i want to figure out is how to make "freezing" the bucket to mean either leaving it in S3 or moving it to a separate S3 bucket in a locked state.
There is no official way to put frozen buckets in s3 that i am aware of. There are plenty of people that do it though. Frozen means no longer searchable and for most people this means getting deleted.
Ok I guess that's a splunkism.
To me "deleted" means "deleted", "frozen" means locked into it's current state and unmodifiable, but still retrievable in some form, i.e. you still have the data, whereas if it's deleted it's really gone and cannot be recovered or restored.
yes it's a confusing term. But you can configure it to be saved if you want. its just that it's deleted by default.
Yes it will be deleted.
If you have spare capacity in your s3 then just set your frozenTimePeriodInSecs
and maxGlobalDataSizeMB
to be large.
I took a look at that article, but it's from 2015, prior to Smartstore.
So does that mean despite all the Smartstore literature describing the indexes rolling from hot->warm->frozen, there's still no way within Smartstore to manage the frozen index and it needs to be done by file system methods?
I've been looking at this: https://answers.splunk.com/answers/709709/splunk-smart-store-s3-bucket-config-to-indexes-72x.html
But that configuration suggests all indexes go to volume:fozen in the same way on my config they go to splunk-db.
Looking at using a stanza for my index, I can't see how I can define two remote targets, one for general indexes and one for frozen indexes and then configure the stanza to use frozen in the index edit screen where you can enter an option frozen path.
So my question remains on how can you have Smartstore use an S3 target for it's warm buckets, but also use it (on an optionally different path) for Frozen.
Answering my own post here, but thought it worth sharing. This article explains within the context of Smartstore what's happening quite well.
https://docs.splunk.com/Documentation/Splunk/7.2.3/Indexer/Automatearchiving
Nice work. Thanks for sharing
The theory being that it'll keep warm buckets indefinitely?
I have customers who are interested in freezing the data (if for no other reason than to lock it down), so they want both warm and frozen on the same S3 target, one in a modifiable format and the other not.
I've read all the smartstore docs and many of the 'answers' links, but it's not entirely clear that the data gets deleted and not simply rolled into another volume.
I was thinking of doing something like this: https://answers.splunk.com/answers/709709/splunk-smart-store-s3-bucket-config-to-indexes-72x.html
i.e. using a second volume for Frozen, but I'm not sure how the system would work with two definitions of storagetype = remote