I am trying to migrate date from local storage to remote store and would like to understand best way to monitor the progress.
The migration from local storage to remote store ( like S3) will start when Cluster bundle with configuration from remote store is deployed from cluster master to Cluster Peer. The migration itself will happened on indexers . During migration the peers will upload all the searchable copied to remote store. When Multiple peer opload the same copy of bucket to remote-store, only one copy will remain and get uploaded.
Once the migration is complete on indexer it will not be attempted again( If need it can be manually triggered). Below are the sample searches you may use to look aspect of migration process.
1)Tracing start of the migration. ( splunkd.log component: DatabaseDirectoryManager has one entry per index)
index=_internal source="splunkd.log" DatabaseDirectoryManager "Remote storage migration needed" | timechart count by idx
11-21-2018 06:38:20.514 +0000 INFO DatabaseDirectoryManager - Remote storage migration needed for idx=main for a bucket count=34
This event has the index name and the count of buckets to be migrated.
2)To track end of migration ( it’s for all indexes )
index=_internal source="splunkd.log" component=CacheManager "Remote storage migration" completed
11-21-2018 06:38:28.957 +0000 INFO CacheManager - Remote storage migration of buckets and summaries completed (durationsec=8 uploadjobs=67)
Note : you can compare that upload_jobs to match with the Total sum of the count for each index
3) Here is a SPL that can also be used to see the progress of the migration ,but it has some limitation
| rest /services/admin/cacheman/_metrics splunk_server=<INDEXERS> | rename migration.total_jobs AS migration_jobs_total,migration.current_job AS migration_jobs_complete | eval migration_jobs_remaining=migration_jobs_total-migration_jobs_complete | fillnull migration.end_epoch value="-" | stats count by splunk_server migration.start_epoch migration.end_epoch migration.status migration_jobs_total migration_jobs_complete migration_jobs_remaining | eval percent_complete = round((migration_jobs_complete/migration_jobs_total)*100,1) | eval current_time_secs=now() | eval time_elapsed_secs=if('migration.status'="finished",('migration.end_epoch'- 'migration.start_epoch'),(current_time_secs - 'migration.start_epoch')) | eval secs_per_job=time_elapsed_secs/migration_jobs_complete | eval time_remaining_secs=migration_jobs_remaining*secs_per_job | eval seconds_per_job=round((secs_per_job),2) | convert timeformat="%+" ctime(migration.start_epoch) AS migration_start_time | convert timeformat="%+" ctime(migration.end_epoch) AS migration_end_time | eval migration_end_time=if('migration.status'="finished",migration_end_time,"-") | convert timeformat="%+" ctime(current_time_secs) AS current_time | eval current_time=if('migration.status'="finished","-",current_time) | eval time_elapsed_hours=round(time_elapsed_secs/3600,2) | eval time_remaining_hours=round((time_remaining_secs/3600),2) | table splunk_server migration.status migration_start_time migration_end_time current_time migration_jobs_total migration_jobs_complete migration_jobs_remaining percent_complete time_elapsed_hours time_remaining_hours seconds_per_job
The above search is sometime misleading, for example in case the indexer crashes/shutdown, the search will show finished as 100%.
4)Upload Operation can be monitored :
index=internal source=/metrics.log TERM(group=cachemgrupload) | timechart span=1s sum(queued) AS queued, sum(succeeded) AS succeeded by host
10-25-2018 10:48:06.599 +0000 INFO Metrics - group=cachemgrupload, elapsedms=17017, kb=124372, succeeded=1
5) Upload speed
index=audit ( action=localbucketupload AND ( sourcetype=audittrail )) | eval elapseds=elapsedms/1000 | eval kbps = kb/elapseds |eval mbps=kbps/1024 | timechart span=1s max(mbps) by host
Sample Event :
Audit:[timestamp=10-25-2018 10:47:37.615, user=n/a, action=localbucketupload, info=completed, cacheid="bid|internal~40~C3912E39-C49C-4A24-B119-AA4B13C0F3F1|", localdir="/home/splunker/splunk/var/lib/splunk/internaldb/db/db1540464387154046158940C3912E39-C49C-4A24-B119-AA4B13C0F3F1", kb=124372, elapsed_ms=17017][n/a]
6)Role of file bucketssyncedtoremotestorage in migration:
find . -type f -name .buckets_synced_to_remote_storage ./var/lib/splunk/audit/db/.buckets_synced_to_remote_storage ./var/lib/splunk/_internaldb/db/.buckets_synced_to_remote_storage ./var/lib/splunk/_introspection/db/.buckets_synced_to_remote_storage ./var/lib/splunk/_telemetry/db/.buckets_synced_to_remote_storage ./var/lib/splunk/fishbucket/db/.buckets_synced_to_remote_storage ./var/lib/splunk/historydb/db/.buckets_synced_to_remote_storage ./var/lib/splunk/defaultdb/db/.buckets_synced_to_remote_storage ./var/lib/splunk/summarydb/db/.buckets_synced_to_remote_storage
At start-up, if an index is S2-enabled, we check to see if buckets need to be uploaded. To check if buckets need to be uploaded we look if file $homePath/.bucketssyncedtoremotestorage exists. The presence of this file indicates that we don't need to upload files to the remote storage and therefore no migration needs to happen.
7) Here is another search to confirm migration of indexers.
./splunk search "|rest /services/admin/cacheman |search cm:bucket.stable=0 |stats count" # should return zero
Can you correct the typo for "understane" please?
Also perhaps you can accept your own answers so they are marked as closed?
I appreciate the question/answer format, hopefully some of these queries are fed back into the monitoring console...