I would like to understand the directory structure for Splunk bucket on the remote store.
The directory scheme is as follows when we upload a bucket to Smartstore:
{2 letter hash} / {2 letter hash} / {bucket_id_number-origin_guid} / {"guidSplunk"-uploader_guid}/ (bucket contents)
The (two) two letter hashes are determined by the first 4 characters of the sha1 output of "bucket-number_GUID" of buckets (doesnt care about et/lt/index).
For example:
my bucket on local storage is:
$SPLUNK_HOME/_internal/db/db_1533256878_1533256720_10_33A1AEFB-8C83-4005-80F0-6BEBC769EBE0
gets uploaded into remote storage as:
_internal/db/56/ba/10~33A1AEFB-8C83-4005-80F0-6BEBC769EBE0/guidSplunk-33A1AEFB-8C83-4005-80F0-6BEBC769EBE0
(note, the _internal/db comes from my s2 remote storage settings in indexes.conf)
because:
$ echo -n "10~33A1AEFB-8C83-4005-80F0-6BEBC769EBE0" | sha1sum
56bae43a9604d078d1d617ff9d63faa0a21302e0 -
note that the
56ba → 56/ba
is used as the leading two directories of our bucket.
also note that we also identify the uploader of the bucket - its very possible the same bucket is uploaded twice by different indexers, resulting in multiple copies in the bucket folder (there might be two guidSplunk-GUID1 and guidSplunk-GUID2). the receipt.json will specify which one all users of the bucket (readers/downloaders) should use.
AT the high level these are the steps:
i)Once the bucket is rolled to warm, "remote_storage_upload_timeout" timer is started on target peers, it is registered with the CacheManager on source, bucket is optimized, it is uploaded to remote storage by source and bucket registration ends. Once the source peer uploads the bucket to remote storage it notifies the target peers that bucket has been uploaded to remote storage. After the target peers receive that message from source peer, they cancel the registration with cachemanager, mark the bucket as stable and evict the bucket. Below is a breakdown of above steps with corresponding log messages -
Source peer rolls the bucket
05-15-2019 15:46:44.844 +0000 INFO HotBucketRoller - finished moving hot to warm bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 idx=perfmon from=hot_v1_468 to=db_1557754467_1557734922_468_FA94F613-032D-4C8E-9D04-EFA3F5E923C9 size=397975552 caller=lru maxHotBuckets=10, count=11 hot buckets,evicting_count=1 LRU hot
s
ii)Done key received on target peer(which means we are done with replication from source**)
05-15-2019 15:46:44.879 +0000 INFO S2SFileReceiver - event=onDoneReceived replicationType=eJournalReplication bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9
05-15-2019 15:46:44.879 +0000 INFO S2SFileReceiver - about to finalize from close bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9
iii)Target peer starts the timer "remote_storage_upload_timeout", so that if it doesn't hear from the source peer until timer expires then it can start the upload of the bucket and also rolls the bucket from hot→warm at its end.**
INFO CMSlave - bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 added so this target peer can assume responsility of upload later
05-15-2019 15:46:44.884 +0000 INFO S2SFileReceiver - event=rename bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 from=/opt/splunk/var/lib/splunk/perfmon/db/468_FA94F613-032D-4C8E-9D04-EFA3F5E923C9 to=/opt/splunk/var/lib/splunk/perfmon/db/db_1557754467_1557734922_468_FA94F613-032D-4C8E-9D04-EFA3F5E923C9
05-15-2019 15:46:44.884 +0000 INFO CMSlave - bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 Transitioning status from=StreamingTarget to=Complete for reason="hot success (target)"
iv)Meanwhile, source starts upload of bucket since it has finished optimize/repair process for the bucket. It also saves the state of the files in the bucket directory locally by writing to the file "cachemanager_local.json"**
5-15-2019 15:46:54.718 +0000 INFO DatabaseDirectoryManager - cid="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|" uploading the bucket to remote storage since optimize/repair process has completed successfully
05-15-2019 15:46:54.723 +0000 INFO CacheManager - action=upload, cacheId="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|", status=attempting
05-15-2019 15:47:00.786 +0000 INFO CacheManager - action=upload, cacheId="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|", status=succeeded, elapsed_ms=6063
Corresponding entry in audit.log for the bucket upload
05-15-2019 15:46:54.723 +0000 INFO AuditLogger - Audit:[timestamp=05-15-2019 15:46:54.723, user=n/a, action=local_bucket_upload, info=started, cache_id="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|", prefix=reedexpo/perfmon/db/4d/e7/468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9/guidSplunk-FA94F613-032D-4C8E-9D04-EFA3F5E923C9][n/a]
05-15-2019 15:47:00.817 +0000 INFO AuditLogger - Audit:[timestamp=05-15-2019 15:47:00.817, user=n/a, action=local_bucket_upload, info=completed, cache_id="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|", local_dir="/opt/splunk/var/lib/splunk/perfmon/db/db_1557754467_1557734922_468_FA94F613-032D-4C8E-9D04-EFA3F5E923C9", kb=382940, elapsed_ms=6095][n/a]
NOTE: "cachemanager_local.json" is a local file that resides in db directory for warm buckets. It is used to maintain the state of what files are present locally in the disk. We update this file when we are either about to upload the bucket or we download the bucket contents when a search opens the bucket or we cancel the upload.
The contents of this file looks something like this -
v)Source peer reports upload status to replication target/s**
05-15-2019 15:47:00.817 +0000 INFO CMSlave - bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 upload status being reported to the replicated targets
05-15-2019 15:47:00.818 +0000 INFO CMRepJob - running job=CMReportBucketInStableStorageJob bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 ot_guid=5B2CABAA-22E8-4B25-AE31-C089D69FE13D ot_hp=INDEXER:8089
vi)Target peer/s receive the notification from source peer and update their metadata with remote storage metadata by checking if the bucket is present on remote storage.**
05-15-2019 15:47:00.844 +0000 INFO CMSlave - bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 reported to be on remote storage by upload peer, will confirm it is present by checking the remote storage
05-15-2019 15:47:00.871 +0000 INFO DatabaseDirectoryManager - cid="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|" found to be on remote storage
05-15-2019 15:47:00.871 +0000 INFO IndexerIf - Asked to update bucket manifest values, bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9
05-15-2019 15:47:00.903 +0000 INFO DatabaseDirectoryManager - idx=perfmon Writing a bucket manifest in hotWarmPath='/opt/splunk/var/lib/splunk/perfmon/db', pendingBucketUpdates=0 . Reason='Updated metadata of bucket with remote storage metadata, bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9'
NOTE: This step of target getting notified by source about uploading the bucket has been made optional in recent versions. By default this feature is turned off. So, targets won't receive the notification from source that it has uploaded the bucket and target will eventually check the bucket on remote storage after remote_storage_upload_timeout and if its present then just marks the buckets stable as part of cancelled upload. Below is the configuration which is introduced to make this feature optional -
report_remote_storage_bucket_upload_to_targets =
* Only valid for 'mode=slave'.
* For a remote storage enabled index, this attribute specifies whether
the source peer reports the successful bucket upload to target peers.
This notification is used by target peers to cancel their upload timers
and synchronize their bucket state with the uploaded bucket on remote
storage.
* Do not change the value from the default unless instructed by
Splunk Support.
* Default: false
vii)Now, target peer cancels the registration of bucket with cachemanager, mark the bucket as stable and then evict the bucket locally**
05-15-2019 15:47:00.905 +0000 INFO CMSlave - bid=perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9 removed from the replicatedBucketsUploadTimeout map
05-15-2019 15:47:00.912 +0000 INFO CacheManager - cancel registering new cacheId="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|" for search sid=bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|
05-15-2019 15:47:00.912 +0000 INFO CacheManager - Making cacheId="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|" stable as part of cancelled upload
The corresponding entry in audit.log entry which logs eviction of bucket -
05-15-2019 15:47:00.952 +0000 INFO AuditLogger - Audit:[timestamp=05-15-2019 15:47:00.952, user=splunk-system-user, action=local_bucket_evict, info=completed, cache_id="bid|perfmon~468~FA94F613-032D-4C8E-9D04-EFA3F5E923C9|", kb=389622, elapsed_ms=15, files="strings_data,sourcetypes_data,sources_data,hosts_data,lex,tsidx,bloomfilter,journal_gz,deletes,other"][n/
the "files" evicted in the above log entry. If you are evicting all the files in bucket directory, that usually means that target is evicting the bucket because source or some other peer has already uploaded the bucket. You will encounter other local_bucket_evict logs in audit.log which will have different "files" to be evicted, which can be due to other reasons covered later in this page(most commonly "deletes" files, which are evicted due to primary changes).
Excellent post, thankyou!
If you are testing I can join zoom and look at what you are seeing ( around 11AM PSt)
The directory scheme is as follows when we upload a bucket to Smartstore:
{2 letter hash} / {2 letter hash} / {bucket_id_number-origin_guid} / {"guidSplunk"-uploader_guid}/ (bucket contents)
The (two) two letter hashes are determined by the first 4 characters of the sha1 output of "bucket-number_GUID" of buckets (doesnt care about et/lt/index).
For example:
my bucket on local storage is:
$SPLUNK_HOME/_internal/db/db_1533256878_1533256720_10_33A1AEFB-8C83-4005-80F0-6BEBC769EBE0
gets uploaded into remote storage as:
_internal/db/56/ba/10~33A1AEFB-8C83-4005-80F0-6BEBC769EBE0/guidSplunk-33A1AEFB-8C83-4005-80F0-6BEBC769EBE0
(note, the _internal/db comes from my s2 remote storage settings in indexes.conf)
because:
$ echo -n "10~33A1AEFB-8C83-4005-80F0-6BEBC769EBE0" | sha1sum
56bae43a9604d078d1d617ff9d63faa0a21302e0 -
note that the
56ba → 56/ba
is used as the leading two directories of our bucket.
also note that we also identify the uploader of the bucket - its very possible the same bucket is uploaded twice by different indexers, resulting in multiple copies in the bucket folder (there might be two guidSplunk-GUID1 and guidSplunk-GUID2). the receipt.json will specify which one all users of the bucket (readers/downloaders) should use.
Thank you very much. I have been looking out for this exact info for a while.
Hi rbal,
How likely is it that the same bucket is uploaded twice? My understanding was that only the originating bucket would be uploaded with the indexers which have the replicated buckets checking if the bucket has already been uploaded before attempting to upload their copies?
Thanks
It can happen when the target peers didn't receive an acknowledgment from the primary after its upload within the remote_storage_upload_timeout (default is 60 seconds) and the target peer will now check S3 for the receipt.json and if it couldn't find, it will upload its copy to the S3 with its receipt.json. This is a fallback mechanism in case if the primary server goes down for any reason, Splunk still ensures that a copy of the bucket is uploaded by another peer.
Thanks very much rbal & srajarat for your replies and for your detailed steps below rbla. My understanding was that is was very unlikely for a bucket to be uploaded twice. The reason I asked was because of the "very possible" in the accepted answer - I just wanted to make sure I wasn't missing anything, thanks.
I think it's highly unlikely that bucket will get uploaded twice. Why do you think that bucket was uploaded twice?
I am seeing the behavior with classic to SmartStore migration in a multisite setup on 8.0, where every indexer that has a copy uploads to S3 and since each has a different uploader_guid, they don't get overwritten except for the receipt.json file which will refer to the last one to be uploaded. I am now testing the same on a single site just to make sure if this is an issue in a multi-site setup.
FYI, I had site RF as 3 (origin:2,total:3) and I had 3 copies of the bucket in S3 meaning the two copies from the origin site also got uploaded which is weird. Clearly there seems to be different code path for migration vs standard upload as part of ingest.
for S2 one of the recommendations is to set RF=sf