Knowledge Management

Smartstore: How to upload buckets manually to remote store after the migration of indexer is over?

rbal_splunk
Splunk Employee
Splunk Employee

We have a few different requirements.
i)Upload multiple (buckets)TB of legacy Standalone buckets to the index that is already migrated to the remote store.
ii)Upload a few legacy Standalone bucket to an index after it has already migrated.

Labels (1)
Tags (1)
0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

Before answering this question we need to understand cachemanager_upload.json.

This file resides in SPLUNK_HOME/var/run/splunk/cachemanager_upload.json, and this is used to migrate bucket to smart store.

the sample below show the list of the bucket to be uploaded to the remote store.

cat ./var/run/splunk/cachemanager_upload.json | sed 's/,/\n/g'
{"bucket_ids":["bid|_audit~100~761A77A2-6676-4BF9-83CD-1CB243ED61BF|"
"bid|_audit~103~EDEAC3E5-E0B3-45B9-84B3-A1E087035148|"
"bid|_audit~104~761A77A2-6676-4BF9-83CD-1CB243ED61BF|"
"bid|_audit~108~EDEAC3E5-E0B3-45B9-84B3-A1E087035148|"
"bid|_audit~110~761A77A2-6676-4BF9-83CD-1CB243ED61BF|"
"bid|_audit~124~761A77A2-6676-4BF9-83CD-1CB243ED61BF|"

AS part of the migration, DDM(DatabaseDirectoryManager) pre-registers the buckets to cache manager via bulk register i.e. it writes the buckets to cachemanager_upload.json at "$SPLUNK_HOME/var/run/splunk/". This file maintains the list of buckets that need to be uploaded to remote storage. Whenever we need to upload a bucket, we flush the bid to this file, so that in scenarios where splunk crashed or got restarted before the upload, we would resume the upload process from where we left off and after the upload is done, we would remove the entry from this file, so that in-memory state is in sync with state on the disk.

This file can also be manually updated to upload the bucket to remote using bulk_reigter.Hit caches man's endpoint for a bucket requesting it to add the bucket to cachemanager_upload.json.

curl -k -u <user>:<passwd> -X POST https://<uri>/services/admin/cacheman/_bulk_register -d cache_id="<cacheId>"

example:

 curl -k -u admin:changeme -X POST https://localhost:10041/services/admin/cacheman/_bulk_register -d cache_id="bid|taktaklog~1~F7770FEB-F5A6-4846-A0BB-DDC05126BBF6|"

Here is an example to upload multiple buckets

curl --netrc-file nnfo -k -X POST https://localhost:8089/services/admin/cacheman/_bulk_register  -d cache_id="bid|nccihub~17~024011E7-E61E-45CE-82DE-732038D5C276|" -d cache_id="bid|nccihub~22~024011E7-E61E-45CE-82DE-732038D5C276|" -d cache_id="bid|nccihub~28~024011E7-E61E-45CE-82DE-732038D5C276|" -d cache_id="bid|nccihub~34~024011E7-E61E-45CE-82DE-732038D5C276|" -d cache_id="bid|nccihub~12~E646664A-D351-41E4-BBE7-5B02A08C44C9|" -d cache_id="bid|nccihub~17~F876C294-3E3E-488A-8344-16727AC34C52|" -d cache_id="bid|nccihub~17~E646664A-D351-41E4-BBE7-5B02A08C44C9|"

When the bulk register rest endpoint is called it add the bucket to cachemanager_upload.json and the subsequent restart of the indexer would upload the bucket to the remote store.

You may hit a scenario where the customer is planning to bring multiple TB of the legacy standalone bucket to Smarstore cluster deployment.

When the indexer is first enabled for the smart store, the bucket would migrate to the remote store. When migration is finished Splunk would create $SPLUNK_HOME/var/lib/splunk//db/.buckets_synced_to_remote_storage , and any newly added bucket would not get uploaded.

For the requirement to add "multiple TB of the legacy standalone bucket to Smarstore cluster deployment", we should use migration and not bulk_register. In this case, you can remove $SPLUNK_HOME/var/lib/splunk//db/.buckets_synced_to_remote_storage and restart the indexer and the bucket will be re-uploaded.

goelt2000
Explorer

Hi @rbal_splunk , is this relevant for Splunk 9.0.4 (build de405f4a7979), as well?

I am trying by adding a bid to the file, but do not see it getting uploaded to SmartStore. Neither am I able to find the bucket id in the logs any where. 

 

Please advise.

 

Thanks!

0 Karma

gjanders
SplunkTrust
SplunkTrust

While I used:

"For the requirement to add "multiple TB of the legacy standalone bucket to Smarstore cluster deployment", we should use migration and not bulk_register. In this case, you can remove $SPLUNK_HOME/var/lib/splunk//db/.buckets_synced_to_remote_storage and restart the indexer and the bucket will be re-uploaded."

 

My only recommendation is that if you migrate the buckets from a single-site indexer cluster to a multi-site indexer cluster you consider changing the (old) cluster to multi-site pre-migration.

I hit a strange issue months later where as the buckets froze the cluster manager/master node kept querying smartstore for the buckets triggering 404 errors.

The issue disappeared on restart of the cluster master but it happened for any buckets that were migrated from the single site cluster using the above method, so I'm wondering if the fact that the buckets were not multi-site were part of the issue...

Another alternative to a restart would be (as mentioned in the support case):

curl -k -u admin:<password> -X POST "https://<cm>:<mgmt_port>/services/cluster/master/buckets/<bucket_id>/remove_all"

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...