Knowledge Management

SmartStore Behaviors

Contributor

I'd like to better understand what behaviors SmartStore is going to exhibit in my environment, and how do I manage them? What can I do to prepare my environment for SmartStore?

1 Solution

Contributor

S2 behaviors in no particular order. I will update this post as new information is learned.

  • RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
  • S2 cachemanager will download components of a bucket as searches determine what’s needed. Maybe bloomfilters, deletes, journal.* or other components, and as such multiple downloads for the same bucket may look like they are happening, but per component, no duplicate downloads should happen.
  • Evictions don’t always seem to show up in MC on the S2 pages. The following will.

    index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes

  • Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.

  • During a rolling restart, as each indexer is marked to go down

  • Hot buckets are rolled to warm and uploaded to S3 before process shutdown is complete, if upload speed is slow, this can delay restarts

  • CM begins to reassign primacy for buckets on the indexer on the way down to other indexers

  • All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted

  • As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.

  • SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.

  • Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets

Added Nov 2019

  • Disk part 1: S2 disk I/O requirements seem to be higher than non-S2, due to the bucket downloading process needing to be able to write large amounts of data quickly as cachemanager populates buckets for search. Default downloading config allows for 8 simultaneous downloads at once. Disks previously able to shoulder the load may not be up to the task of S2’s caching requirements. I'm looking at you, RAID5 volumes. By definition it's cache space (and hot bucket space, but hot is replicated), so use RAID0 (stripe) for the fastest disk possible, and not waste a MB of available disk space. RAID10 (mirrored stripes) is also acceptable, but cuts usable disk space by 50%.
  • Disk part 2: To expand on the above a bit, S2 performance is more than just high IOPS, it's about throughput too. Customers running S2 in AWS that have chosen to use gp2 EBS volumes for hot/cachemanager will likely see severe IO contention resulting in IO wait % jumping during high periods of S2 bucket downloads from remote storage. This is quite easy to see in top or iostat when users run searches that trigger large bucket evictions & bucket downloads from remote storage. gp2 has a limit of 250MB/sec, which doesn't take long to hit when the network is 10 gig or faster. Yes, a fast network means data written to kernel buffer cache at a high rate and when its time to sync to disk, the storage won't be able to keep up. io1 EBS type is better, at 1000MB/s, but still can exhaust throughput capacity during periods of concurrent high bucket downloads and search that taxes the storage for both reads and writes in addition to ingestion and hot bucket replication. In AWS, it is highly recommended to use NVME for hot/cachemanager (i3 and i3en instance types work very well here) in RAID 0 and consider setting RF/SF=3 (still applies to hot buckets) to sleep better at night.
  • Disk part 3: If deploying S2 outside of AWS, strive to obtain the fastest disks (throughput & IOPS) available, whether local SSDs or NVME to avoid storage bottlenecks getting in the way of your Splunk performance.

View solution in original post

Splunk Employee
Splunk Employee

Weird. Per developer, it's not supposed to work that way. I'll follow up and report back.

0 Karma

Contributor

Ah, this isn't really the case, but I can see how it might appear this way. There is now only "hot" and "not hot" in terms of a bucket lifecycle in S2. The concept of warm and cold being separate is no longer really a thing.

Hot (read/write) is still replicated based on CM RF/SF settings until it rolls to read-only, and then 1 copy is made of the bucket to S3, and the other local copies are marked for deletion by the indexers' cachemanager process.

The cachemanager retrieves read-only buckets from S3 when it needs to so a search can be completed and those bucket share the same file system as hot...so make sure your hot/cachemanager filesystem is nice and fast.

0 Karma

Influencer

I'm not sure I follow. You don't have a choice of WARM or COLD with S2. There is HOT; briefly there is WARM while waiting to upload to remote; and finally there is remote with cached local copies. Th entire bucket lifecycle changes.

At least this is my understanding.

0 Karma

Contributor

S2 behaviors in no particular order. I will update this post as new information is learned.

  • RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
  • S2 cachemanager will download components of a bucket as searches determine what’s needed. Maybe bloomfilters, deletes, journal.* or other components, and as such multiple downloads for the same bucket may look like they are happening, but per component, no duplicate downloads should happen.
  • Evictions don’t always seem to show up in MC on the S2 pages. The following will.

    index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes

  • Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.

  • During a rolling restart, as each indexer is marked to go down

  • Hot buckets are rolled to warm and uploaded to S3 before process shutdown is complete, if upload speed is slow, this can delay restarts

  • CM begins to reassign primacy for buckets on the indexer on the way down to other indexers

  • All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted

  • As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.

  • SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.

  • Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets

Added Nov 2019

  • Disk part 1: S2 disk I/O requirements seem to be higher than non-S2, due to the bucket downloading process needing to be able to write large amounts of data quickly as cachemanager populates buckets for search. Default downloading config allows for 8 simultaneous downloads at once. Disks previously able to shoulder the load may not be up to the task of S2’s caching requirements. I'm looking at you, RAID5 volumes. By definition it's cache space (and hot bucket space, but hot is replicated), so use RAID0 (stripe) for the fastest disk possible, and not waste a MB of available disk space. RAID10 (mirrored stripes) is also acceptable, but cuts usable disk space by 50%.
  • Disk part 2: To expand on the above a bit, S2 performance is more than just high IOPS, it's about throughput too. Customers running S2 in AWS that have chosen to use gp2 EBS volumes for hot/cachemanager will likely see severe IO contention resulting in IO wait % jumping during high periods of S2 bucket downloads from remote storage. This is quite easy to see in top or iostat when users run searches that trigger large bucket evictions & bucket downloads from remote storage. gp2 has a limit of 250MB/sec, which doesn't take long to hit when the network is 10 gig or faster. Yes, a fast network means data written to kernel buffer cache at a high rate and when its time to sync to disk, the storage won't be able to keep up. io1 EBS type is better, at 1000MB/s, but still can exhaust throughput capacity during periods of concurrent high bucket downloads and search that taxes the storage for both reads and writes in addition to ingestion and hot bucket replication. In AWS, it is highly recommended to use NVME for hot/cachemanager (i3 and i3en instance types work very well here) in RAID 0 and consider setting RF/SF=3 (still applies to hot buckets) to sleep better at night.
  • Disk part 3: If deploying S2 outside of AWS, strive to obtain the fastest disks (throughput & IOPS) available, whether local SSDs or NVME to avoid storage bottlenecks getting in the way of your Splunk performance.

View solution in original post

Splunk Employee
Splunk Employee

Hi David, this is a great session.
Today, one Splunk instance identified some issues with smartstore on top of on-prem object storage. It worked normal since smartstore was enabled several months ago. Most of the time, the indexing rate per indexer is about 8-10MB/s. But, while there was a spike (not sure how much yet), indexer processor was stuck and consuming 100% CPU on indexer. All pipelines were blocked and couldn't be recovered. Indexing rate dropped to 2MB/s. They restarted the indexer. It went back to normal with index rate of 16MB/s.
Around 20min before the congestion, Some errors like "DatabaseDirectoryManager - failed to open bucket/waif for bucket to be local through CacheManager" started to be reported by indexer.
Their hot buckets are on SSD without RAID.

Any thought on this case?

0 Karma

Splunk Employee
Splunk Employee

MC showed the major cause was ChillOrFreeze in indexer. But, the total data stored in smartstore was way below the maxGlobalDataSizeMB.

0 Karma

Esteemed Legend

There was a very serious bug in the SmartStore code that caused buckets to be accidentally deleted. See the (absurdly vague) headline regarding might impact data durability in certain rare ... in Fixed Issues here:
https://docs.splunk.com/Documentation/Splunk/7.2.4/ReleaseNotes/Fixedissues

0 Karma

Explorer

In my logs I see "deletes" files being downloaded, what is the deletes file in the bucket used for? Thanks

0 Karma

Contributor

That file is where the info is stored to block events from showing up in search that have had "|delete" run against them in the past.

0 Karma

Influencer

This is a really good rundown for anyone planning to use S2. Thanks for the summary @davidpaper!

0 Karma

Esteemed Legend

@SloshBurch - We need a best practice wizard in here.

0 Karma

Ultra Champion

Thanks @woodcock. I hope to tackle smartstore soon and will revisit this at this time.

0 Karma