Knowledge Management

SmartStore Behaviors

Contributor

I'd like to better understand what behaviors SmartStore is going to exhibit in my environment, and how do I manage them? What can I do to prepare my environment for SmartStore?

1 Solution

Contributor

S2 behaviors in no particular order. I will update this post as new information is learned.

  • RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
  • S2 cachemanager will download components of a bucket as searches determine what’s needed. Maybe bloomfilters, deletes, journal.* or other components, and as such multiple downloads for the same bucket may look like they are happening, but per component, no duplicate downloads should happen.
  • Evictions don’t always seem to show up in MC on the S2 pages. The following will.

    index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes

  • Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.

  • During a rolling restart, as each indexer is marked to go down

  • Hot buckets are rolled to warm and uploaded to S3 before process shutdown is complete, if upload speed is slow, this can delay restarts

  • CM begins to reassign primacy for buckets on the indexer on the way down to other indexers

  • All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted

  • As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.

  • SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.

  • Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets

Added Nov 2019

  • Disk part 1: S2 disk I/O requirements seem to be higher than non-S2, due to the bucket downloading process needing to be able to write large amounts of data quickly as cachemanager populates buckets for search. Default downloading config allows for 8 simultaneous downloads at once. Disks previously able to shoulder the load may not be up to the task of S2’s caching requirements. I'm looking at you, RAID5 volumes. By definition it's cache space (and hot bucket space, but hot is replicated), so use RAID0 (stripe) for the fastest disk possible, and not waste a MB of available disk space. RAID10 (mirrored stripes) is also acceptable, but cuts usable disk space by 50%.
  • Disk part 2: To expand on the above a bit, S2 performance is more than just high IOPS, it's about throughput too. Customers running S2 in AWS that have chosen to use gp2 EBS volumes for hot/cachemanager will likely see severe IO contention resulting in IO wait % jumping during high periods of S2 bucket downloads from remote storage. This is quite easy to see in top or iostat when users run searches that trigger large bucket evictions & bucket downloads from remote storage. gp2 has a limit of 250MB/sec, which doesn't take long to hit when the network is 10 gig or faster. Yes, a fast network means data written to kernel buffer cache at a high rate and when its time to sync to disk, the storage won't be able to keep up. io1 EBS type is better, at 1000MB/s, but still can exhaust throughput capacity during periods of concurrent high bucket downloads and search that taxes the storage for both reads and writes in addition to ingestion and hot bucket replication. In AWS, it is highly recommended to use NVME for hot/cachemanager (i3 and i3en instance types work very well here) in RAID 0 and consider setting RF/SF=3 (still applies to hot buckets) to sleep better at night.
  • Disk part 3: If deploying S2 outside of AWS, strive to obtain the fastest disks (throughput & IOPS) available, whether local SSDs or NVME to avoid storage bottlenecks getting in the way of your Splunk performance.

View solution in original post

Explorer

Yes. If you do not update coldPath, any searches that involve data from what used to be the cold tier would still download data from S3 into the coldPath location. After migration to SmartStore, coldPath can certainly be changed but it cannot be same as homePath but the volume can be same except coldPath will have the colddb instead of db.

Here is an example.

[apache]
remotePath = volume:remote_store/$_index_name
homePath = volume:hot/apache/db
coldPath = volume:hot/apache/colddb
0 Karma

Influencer

The config there is basically what i use (once migration completed and coldDB was empty)

0 Karma

SplunkTrust
SplunkTrust

Just to further clarify, is the process you tested:

  1. Index with warm and cold buckets (non-smartstore)
  2. Migrate said index to smartstore
  3. Change cold path location + restart indexer (or generally evict the cache)
  4. Now searching for data in the timerange where the cold buckets previously were results in them downloading into the new cold path? Or straight into the homePath location?

I'm assuming number (4) goes into the cold path and going forward everything slowly appears in homePath as there will be no more cold buckets (eventually)

Thanks!

0 Karma

Contributor

yeah, you got it. #4, straight to the directory you pointed coldPath to. In the example above, homePath and coldPath use the same volume, but different directories on the same filesystem.

0 Karma

Esteemed Legend

I think that it is worth noting explicitly (@davidpaper implied it) that currently, SS/S3 is currently NOT practical for hot/cold buckets/volumes and that it should ONLY be used for warm.

0 Karma

Esteemed Legend

After learning more, apparently a more proper statement is When using SmartStore, there is no need to use cold at all and Splunk should always configured to have NO COLD, or maybe not...?

0 Karma

Contributor

Once an index is converted to use SmartStore, you are spot on. No more need for a coldPath entry for that index.

Edit: The above is incorrect. You still need a coldPath entry in indexes.conf for the index, but the cold volume shouldn't be actively used once the buckets have been evicted from there.

0 Karma

Splunk Employee
Splunk Employee

The index still requires a configured coldPath. See https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/MigratetoSmartStore

Also, at the time that the index was migrated to SmartStore, any buckets that were in the coldPath continue to remain in the coldPath. See https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/SmartStoreindexing

0 Karma

Influencer

I can confirm that once migration takes place, buckets are no longer stored on cold.
EDIT: I tried to force the CM to populate the cache with "cold buckets" but have failed to replicate this behavior. (Ran a search over a small window from months ago on a known index that would have been cold at the time of migration. No colddb population.)

0 Karma

Splunk Employee
Splunk Employee

Strictly speaking, it's true that the bucket contents will no longer be under coldPath, post-migration, as they are now stored remotely. But the bucket metadata should still be under coldPath, and bucket contents will get moved to coldPath if required to fulfill a search.

0 Karma

Influencer

The entirety of cold storage has 0 files in it. Remains true after running historical searches that would surely end up with some cold data in play.

0 Karma

Explorer

After migration to SmartStore, the data on coldPath is not automatically removed unless it is forced out through eviction or through the natural aging process. As Steve pointed out, the coldPath will have metadata stubs and any searches that spans across the cold data will download the data from S3 back to the coldPath.

Alternatively, after migration, the coldPath location can be changed to some other location (or even homePath) as the idea for migration is to get only a single copy on to S3 and reclaim the space from the warm and cold tiers.

0 Karma

Contributor

This is spot on, and a behavior I hadn't understood until very recently. Reassigning coldPath to homePath is an excellent idea.

0 Karma

New Member

Afternoon. Sorry to resurrect this but I want to better understand this. If I have hot and cold volumes and I migrate all indexes to S2 in theory I should be able to unmount my cold volume as long as I point the path now to my home path? And my home path is basically where I have my warm and hot pointed. Would that be correct?

0 Karma

Influencer

When my migration was complete, cold was empty and is no longer used for any bucket storage. None. Zero. I don't know if I'm a unicorn, or the posts above are not accurate. Once it's empty, point the location to whatever you want (that exists and has correct perms). It won't be used.

0 Karma

Explorer

I haven't seen this behavior yet. I am assuming you are testing in an indexer cluster environment. I will be testing some more this weekend. I will update.

0 Karma

Influencer

I've migrated 8 production index clusters (all i have). We started with the beta of S2 in 2 environments (7.1.x). Moving to all others with 7.2.3/4. In every migration from non-S2 to S2, the cold storage was emptied after migration was complete.

0 Karma

Explorer

This is completely odd. I just migrated some classic indexes that had data in ColdPath and even tried evicting all the buckets, ran a search and it brought the data back to ColdPath.

0 Karma

SplunkTrust
SplunkTrust

What you have mentioned does match the documentation/training...

0 Karma

SplunkTrust
SplunkTrust

Interesting, docs and training advise cold path is used for cache of the buckets that were in that location pre-migration...

I haven't tested this...

0 Karma