I'd like to better understand what behaviors SmartStore is going to exhibit in my environment, and how do I manage them? What can I do to prepare my environment for SmartStore?
S2 behaviors in no particular order. I will update this post as new information is learned.
Evictions don’t always seem to show up in MC on the S2 pages. The following will.
index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes
Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.
During a rolling restart, as each indexer is marked to go down
CM begins to reassign primacy for buckets on the indexer on the way down to other indexers
All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted
As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.
SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.
Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets
Added Nov 2019
Yes. If you do not update coldPath, any searches that involve data from what used to be the cold tier would still download data from S3 into the coldPath location. After migration to SmartStore, coldPath can certainly be changed but it cannot be same as homePath but the volume can be same except coldPath will have the colddb instead of db.
Here is an example.
[apache]
remotePath = volume:remote_store/$_index_name
homePath = volume:hot/apache/db
coldPath = volume:hot/apache/colddb
The config there is basically what i use (once migration completed and coldDB was empty)
Just to further clarify, is the process you tested:
I'm assuming number (4) goes into the cold path and going forward everything slowly appears in homePath as there will be no more cold buckets (eventually)
Thanks!
yeah, you got it. #4, straight to the directory you pointed coldPath to. In the example above, homePath and coldPath use the same volume, but different directories on the same filesystem.
I think that it is worth noting explicitly (@davidpaper implied it) that currently, SS/S3
is currently NOT practical for hot/cold
buckets/volumes and that it should ONLY be used for warm
.
After learning more, apparently a more proper statement is When using SmartStore, there is no need to use cold at all and Splunk should always configured to have NO COLD
, or maybe not...?
Once an index is converted to use SmartStore, you are spot on. No more need for a coldPath entry for that index.
Edit: The above is incorrect. You still need a coldPath entry in indexes.conf for the index, but the cold volume shouldn't be actively used once the buckets have been evicted from there.
The index still requires a configured coldPath. See https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/MigratetoSmartStore
Also, at the time that the index was migrated to SmartStore, any buckets that were in the coldPath continue to remain in the coldPath. See https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/SmartStoreindexing
I can confirm that once migration takes place, buckets are no longer stored on cold.
EDIT: I tried to force the CM to populate the cache with "cold buckets" but have failed to replicate this behavior. (Ran a search over a small window from months ago on a known index that would have been cold at the time of migration. No colddb population.)
Strictly speaking, it's true that the bucket contents will no longer be under coldPath, post-migration, as they are now stored remotely. But the bucket metadata should still be under coldPath, and bucket contents will get moved to coldPath if required to fulfill a search.
The entirety of cold storage has 0 files in it. Remains true after running historical searches that would surely end up with some cold data in play.
After migration to SmartStore, the data on coldPath is not automatically removed unless it is forced out through eviction or through the natural aging process. As Steve pointed out, the coldPath will have metadata stubs and any searches that spans across the cold data will download the data from S3 back to the coldPath.
Alternatively, after migration, the coldPath location can be changed to some other location (or even homePath) as the idea for migration is to get only a single copy on to S3 and reclaim the space from the warm and cold tiers.
This is spot on, and a behavior I hadn't understood until very recently. Reassigning coldPath to homePath is an excellent idea.
Afternoon. Sorry to resurrect this but I want to better understand this. If I have hot and cold volumes and I migrate all indexes to S2 in theory I should be able to unmount my cold volume as long as I point the path now to my home path? And my home path is basically where I have my warm and hot pointed. Would that be correct?
When my migration was complete, cold was empty and is no longer used for any bucket storage. None. Zero. I don't know if I'm a unicorn, or the posts above are not accurate. Once it's empty, point the location to whatever you want (that exists and has correct perms). It won't be used.
I haven't seen this behavior yet. I am assuming you are testing in an indexer cluster environment. I will be testing some more this weekend. I will update.
I've migrated 8 production index clusters (all i have). We started with the beta of S2 in 2 environments (7.1.x). Moving to all others with 7.2.3/4. In every migration from non-S2 to S2, the cold storage was emptied after migration was complete.
This is completely odd. I just migrated some classic indexes that had data in ColdPath and even tried evicting all the buckets, ran a search and it brought the data back to ColdPath.
What you have mentioned does match the documentation/training...
Interesting, docs and training advise cold path is used for cache of the buckets that were in that location pre-migration...
I haven't tested this...