Getting Data In

Splunk SmartStore: Do warm buckets need to roll to frozen?

mccartneyc
Path Finder

Recently setup SmartStore with a test index and sending data to S3. It's working perfectly, but I have questions about the warm to frozen and archiving.

In the following splunk doc, it says hot buckets roll to warm buckets and get uploaded on S3 which is great, but doesn't say they can or can't be held there indefinitely. It then says, "Buckets roll to frozen directly from warm.", but doesn't say anything else about it. If buckets can go to S3 and never get rolled to frozen, that's okay, but if it rolls to frozen without me giving it a reason to roll and be deleted, that's something I need to avoid.

https://docs.splunk.com/Documentation/Splunk/7.3.2/Indexer/SmartStoreindexing#Bucket_states_and_Smar...

  1. After buckets roll to warm and go to the S3 bucket, if no settings for freezing are configured, will Splunk automatically roll the buckets to frozen after a while?

  2. If the warm buckets in the s3 bucket do not get rolled over to frozen and no archiving is set up, will the data in S3 always remain as warm buckets and will that have any issues, besides long searches?

  3. Splunk docs say the coldToFrozenScript can be used and I've tried setting it up so that it when warm buckets in s3 get rolled to frozen, it would use that script to take colddb buckets and archive them to another S3 bucket, but because they do not roll to cold on the local server, nothing gets archived. Trying to get the script to work with Splunk, S3, and cache manager doesn't seem to work. Is there a process or script for archiving smartstore warm buckets to another S3 bucket without having to archive locally on the indexer?

Update 10/17/2019
I've made some changes to the script and tried testing some things out. Switched from python3 to splunks internal python2.7 and adjusted my code to ensure that splunk was running the script with it's own binaries. Splunk states the coldToFrozenScript and be used for SmartStore indexes, but as to what capacity, I'm not sure.

If I run the script manually on a bucket that exists locally on the server (not in S3), the script runs just fine and the bucket gets copied to my S3 endpoint for archiving. I do this by running "/opt/splunk/bin/python2.7 /opt/splunk/etc/slave-apps/ColdToFrozenS3/bin/coldToFrozenS3.py ". It does exactly what I want with local buckets, but I'm trying to get this to run for when SmartStore moves warm s3 buckets to frozen.

So, I deployed the script back to the indexers and when Splunk tries to freeze a smartstore bucket and uses the coldToFrozenScript, I get the errors below. It looks like splunk is trying to retrieve the bucket through cache manager and failing for some odd reason or it is trying to initiate the script before the bucket is pulled from S3, or something else I'm not aware of.

One of the entries shows a 404 error which doesn't make sense as the servers are able to read and write from both S3 buckets and just for testing purposes, I've given their roles full access. Manually downloading and uploading from each indexer to each S3 bucket works fine, so not sure why the 404 is occurring.

10-17-2019 10:42:33.859 -0400 ERROR DatabaseDirectoryManager - failed to open bucket/wait for bucket to be local through CacheManager, cid="bid|smartstore_test~14~044B61B4-FEA8-4CC9-BEA9-C694C082BECA|", exception=localize operation failed for cacheId="bid|smartstore_test~14~044B61B4-FEA8-4CC9-BEA9-C694C082BECA|"

host =  <indexer.hostname.local>    
source =    /opt/splunk/var/log/splunk/splunkd.log  
sourcetype =    splunkd 

10-17-2019 10:42:33.928 -0400 ERROR RetryableClientTransaction - transactionDone(): transactionId=0x7f912e037000 rTxnId=0x7f90c6ffe0f0 success=N HTTP-statusCode=404 HTTP-statusDescription=Not Found retry=N no_retry_reason="transaction had fatal error"

host =  <indexer.hostname.local>    
source =    /opt/splunk/var/log/splunk/splunkd.log  
sourcetype =    splunkd 

10-17-2019 10:42:33.928 -0400 WARN BucketMover - RemoteStorageAsyncFreezer freeze failed for bid=smartstore_test~14~044B61B4-FEA8-4CC9-BEA9-C694C082BECA since coldToFrozenScript="/opt/splunk/bin/python2.7" "/opt/splunk/etc/slave-apps/ColdToFrozenS3/bin/coldToFrozenS3.py" could not be run due to exception=std::exception

host =  <indexer.hostname.local>    
source =    /opt/splunk/var/log/splunk/splunkd.log  
sourcetype =    splunkd

esix_splunk
Splunk Employee
Splunk Employee

So the behavior you're seeing is accurate. For the roll to frozen, Splunk can't roll a bucket directly from S3 to frozen, since S3 is a remote storage. The workflow for rolling to frozen is currently :

1) Bucket is nominated to be frozen-
2) Indexer holding the primary stub for that bucket downloads the bucket to the cachemanager.
3) coldToFrozen Script is invoked
4) On success, local bucket is removed from cachemanager and the next run on the object storage will trigger a deletion of the bucket from the object storage
5) Cluster will remove the stub(s) for the frozen bucket

If you're using Smartstore, and your object storage isnt constrained by space, you can technically set frozenTimePeriodInSecs to 10 years etc, and not have to worry about freezing that data. Keep it searchable!

BARNEYRUDD
Explorer

Nice explanation, I was wondering how to manage the lack of a cold bucket function.

0 Karma

mccartneyc
Path Finder

Late to update this, but resolved my issue. Smartstore will roll the buckets to frozen by default unless you set frozen time to 0 which will leave all warm buckets in S3.

I didn't want that as a long term solution, but I was able to get SmartStore to handle the archiving using a bash script. (tried using a python script and kept running into errors with the bucket copies.

Here is what I did:
1. Created a smartstore index
2. Created an S3 bucket called something like splunk-smartstore
3. In the S3 bucket I created prefixes, one for indexes and one for frozen archives
4. Created a coldToFrozen bash script and deployed it as an app on the indexer cluster.
5. Splunk servers use a role to authenticate to S3 bucket
6. Once all was in place, just sent data to the smartstore index and it handled the rest.

I found that with Smartstore indexes on a cluster, the cache manager will handle the logic about what server does the archiving of a particular bucket. When a bucket rolls from warm to frozen, cache manager will download the warm bucket from the indexes prefix withing the S3 bucket to one of the indexers, splunk will then take path to the bucket and pass it to the cold to frozen script for archiving which places the archive in the S3 bucket under archives. When archiving is successful, cache manager will delete the local and remote copies of the warm bucket.

Below is the script I used. This was based off someone else's example in another thread, but I modified it for what I needed. I wanted to easily be able to go back to S3 and pull an archive based on specific date ranges and add in some organization to where things get stored in S3. The script will use the bucket path to create directories in S3 archives and convert Epoch time to EST. So far this has been working for about two months and no issues or data loss. And the disk space has stayed low as only hot buckets and warm bucket caches are on the disk.

#!/bin/bash
set -e
set -u

export HTTP_PROXY=http://<Proxy_IP>:<Proxy_Port>/
export HTTPS_PROXY=https://<Proxy_IP>:<Proxy_Port>/
export NO_PROXY=169.254.169.254

bucket=$1
instance=$(hostname -s)
region=<AWS REGION>
s3bucket=<Smartstore_S3_Bucket>
NOW=$(date +"%Y-%m-%d")
LOG=/opt/splunk/var/log/splunk/coldToFrozen-${NOW}.log

#Gets index name and warm bucket name from path passed by splunk
index=$(echo $1 | cut -f7 -d"/")
warm=$(echo $1 | cut -f9 -d"/")

#Converts the epoch time from the warm bucket name to EST
startEpoch=$(echo $warm | cut -f3 -d"_")
endEpoch=$(echo $warm | cut -f2 -d"_")
startDate=$(date -d @$startEpoch '+%m_%d_%Y')
endDate=$(date -d @$endEpoch '+%m_%d_%Y')

#Sets AWS Sginature Version - Needed for S3 KMS-SSE
aws configure set s3.signature_version s3v4

#Creates log file
touch ${LOG}

echo "bucket to move: " $bucket >> $LOG

#Copies bucket to S3 and logs the output along with timestamps
/usr/bin/aws s3 cp ${bucket} s3://${s3bucket}/frozen/${index}/${startDate}_to_${endDate}/${warm} --recursive --region ${region} 2>&1 | tr "\r" "\n" > >(awk '{print strftime("%Y-%m-%d:%H:%M:%S ") $0}' >> $LOG)
0 Karma

rosslopez
Observer

Are you saving the frozen buckets to a local directory first with coldtofrozendir? I pushed out an app with just this script to my indexers, and im not seeing anything in my s3 bucket.  When i run the script manually, I get 

bin>./smartstore2frozen.sh
./smartstore2frozen.sh: line 5: $1: unbound variable

 

When I comment out 'set -u'  I get

bin>./smartstore2frozen.sh
date: invalid date ‘@’

 

Heres the script. The only thing i modified is that I removed the proxy portion and changed the bucket name

#!/bin/bash
set -e
set -u

bucket=$1
instance=$(hostname -s)
region=us-east-1
s3bucket=<my s3 bucket>
NOW=$(date +"%Y-%m-%d")
LOG=/opt/splunk/var/log/splunk/smartstore2frozen-${NOW}.log

#Gets index name and warm bucket name from path passed by splunk
index=$(echo $1 | cut -f7 -d"/")
warm=$(echo $1 | cut -f9 -d"/")

#Converts the epoch time from the warm bucket name to EST
startEpoch=$(echo $warm | cut -f3 -d"_")
endEpoch=$(echo $warm | cut -f2 -d"_")
startDate=$(date -d @$startEpoch '+%m_%d_%Y')
endDate=$(date -d @$endEpoch '+%m_%d_%Y')

#Sets AWS Sginature Version - Needed for S3 KMS-SSE
aws configure set s3.signature_version s3v4

#Creates log file
touch ${LOG}

echo "bucket to move: " $bucket >> $LOG

#Copies bucket to S3 and logs the output along with timestamps
/usr/bin/aws s3 cp ${bucket} s3://${s3bucket}/frozen/${index}/${startDate}_to_${endDate}/${warm} --recursive --region ${region} 2>&1 | tr "\r" "\n" > >(awk '{print strftime("%Y-%m-%d:%H:%M:%S ") $0}' >> $LOG)
0 Karma

Steve_G_
Splunk Employee
Splunk Employee

See https://docs.splunk.com/Documentation/Splunk/7.3.2/Indexer/SmartStoredataretention for details on data retention in SmartStore indexes.

In short, these are the available data retention settings for SmartStore indexes:

  • frozenTimePeriodInSecs - Time-based, same as non-SmartStore
  • maxGlobalDataSizeMB - Size-based, SmartStore only
  • maxGlobalRawDataSizeMB - Size-based, SmartStore only

When any of these limits are reached, buckets roll from warm to frozen. (In SmartStore, buckets roll directly from warm to frozen.)

Each of the settings have defaults, so if you do not explicitly configure the settings, freezing behavior will be based on the defaults.

Other freezing behavior is the same as for non-SmartStore indexes. For example, the coldToFrozenScript setting should work as documented.

Also, the cache retention settings function independently from the data retention settings. Cache retention is really an entirely separate issue from data retention.

0 Karma

mccartneyc
Path Finder

Hi Steve, thanks for replying. I had gone through that documentation, but didn't see that smartstore would use the default, but that makes sense. Unfortunately though, I can't get the coldToFrozenScript portion to work properly.

I've applied a script and tried modifying it, but here is the issue. SmartStore does not use colddb and the script gets triggered for freezing, it's setup to run ("/usr/bin/python3" "/opt/splunk/etc/slave-apps/ColdToFrozenS3/bin/coldToFrozenS3.py" Bucket_Name). SmartStore reports the bucket names as smartstore_test~9~044B61B4-FEA8-4CC9-BEA9-C694C082BECA, that's not the actual name of the bucket in the manifest, but from what I'm guessing, is the one that cache manager uses to identify the bucket in S3.

Here is the error I get when coldToFrozenScript gets run:
10-14-2019 12:13:15.808 -0400 WARN BucketMover - RemoteStorageAsyncFreezer freeze failed for bid=smartstore_test~9~044B61B4-FEA8-4CC9-BEA9-C694C082BECA since coldToFrozenScript="/usr/bin/python3" "/opt/splunk/etc/slave-apps/ColdToFrozenS3/bin/coldToFrozenS3.py" could not be run due to exception=std::exception

How would a script be structured so that if a SmartStore index that has coldToFrozenScript setup to take the value of sys.argv[1], which would be the bucket name/id that splunk passes to it, and grabs that bucket to put it in a directory or sync to s3?

The script would need to be able to grab the smartstore bucket that is being frozen from S3 and move the frozen bucket, but don't see that kind of functionality working with the test scripts I've made, found online, or the example that splunk has.

0 Karma

Steve_G_
Splunk Employee
Splunk Employee

The example script that ships with the product is just that, an example. It's likely to need considerable modification to suit your environment.

But instead of creating your own script, have you tried just setting the coldToFrozenDir attribute to specify a location for the archive? If you use that attribute, Splunk will handle the archiving process for you.

0 Karma

mccartneyc
Path Finder

Yeah I modified it heavily to test, created one from scratch and used /modified other examples people have posted online. They all work fine for non-smartstore indexes, but not for smartstore indexes. Because of this, I figured there's something where the script gets called to copy the bucket being frozen, but because the warm bucket to freeze exists in S3 and not locally, it fails. I'm testing some things out now that I hope may help.

The coldToFrozenDir works as well, but we are testing smartstore to reduce disk space, and archiving back to local disk from s3 doesn't help that much. Trying to have hot buckets local, warm buckets in a S3 bucket, then have the coldToFrozenScript setup to archive the bucket directly to another S3 bucket. Essentially only keeping hot buckets and bucket cache on the local server.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...