Solved: Re: SmartStore Behaviors - Page 2

davidpaper · ‎04-08-2019

I'd like to better understand what behaviors SmartStore is going to exhibit in my environment, and how do I manage them? What can I do to prepare my environment for SmartStore?

davidpaper · ‎04-08-2019

S2 behaviors in no particular order. I will update this post as new information is learned.

RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
S2 cachemanager will download components of a bucket as searches determine what’s needed. Maybe bloomfilters, deletes, journal.* or other components, and as such multiple downloads for the same bucket may look like they are happening, but per component, no duplicate downloads should happen.
Evictions don’t always seem to show up in MC on the S2 pages. The following will.
index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes
Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.
During a rolling restart, as each indexer is marked to go down
CM begins to reassign primacy for buckets on the indexer on the way down to other indexers
All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted
As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.
SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.
Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets

Added Nov 2019

Disk part 1: S2 disk I/O requirements seem to be higher than non-S2, due to the bucket downloading process needing to be able to write large amounts of data quickly as cachemanager populates buckets for search. Default downloading config allows for 8 simultaneous downloads at once. Disks previously able to shoulder the load may not be up to the task of S2’s caching requirements. I'm looking at you, RAID5 volumes. By definition it's cache space (and hot bucket space, but hot is replicated), so use RAID0 (stripe) for the fastest disk possible, and not waste a MB of available disk space. RAID10 (mirrored stripes) is also acceptable, but cuts usable disk space by 50%.
Disk part 2: To expand on the above a bit, S2 performance is more than just high IOPS, it's about throughput too. Customers running S2 in AWS that have chosen to use gp2 EBS volumes for hot/cachemanager will likely see severe IO contention resulting in IO wait % jumping during high periods of S2 bucket downloads from remote storage. This is quite easy to see in top or iostat when users run searches that trigger large bucket evictions & bucket downloads from remote storage. gp2 has a limit of 250MB/sec, which doesn't take long to hit when the network is 10 gig or faster. Yes, a fast network means data written to kernel buffer cache at a high rate and when its time to sync to disk, the storage won't be able to keep up. io1 EBS type is better, at 1000MB/s, but still can exhaust throughput capacity during periods of concurrent high bucket downloads and search that taxes the storage for both reads and writes in addition to ingestion and hot bucket replication. In AWS, it is highly recommended to use NVME for hot/cachemanager (i3 and i3en instance types work very well here) in RAID 0 and consider setting RF/SF=3 (still applies to hot buckets) to sleep better at night.
Disk part 3: If deploying S2 outside of AWS, strive to obtain the fastest disks (throughput & IOPS) available, whether local SSDs or NVME to avoid storage bottlenecks getting in the way of your Splunk performance.

View solution in original post

Steve_G_ · ‎06-11-2019

Weird. Per developer, it's not supposed to work that way. I'll follow up and report back.

davidpaper · ‎06-05-2019

Ah, this isn't really the case, but I can see how it might appear this way. There is now only "hot" and "not hot" in terms of a bucket lifecycle in S2. The concept of warm and cold being separate is no longer really a thing.

Hot (read/write) is still replicated based on CM RF/SF settings until it rolls to read-only, and then 1 copy is made of the bucket to S3, and the other local copies are marked for deletion by the indexers' cachemanager process.

The cachemanager retrieves read-only buckets from S3 when it needs to so a search can be completed and those bucket share the same file system as hot...so make sure your hot/cachemanager filesystem is nice and fast.

twinspop · ‎04-13-2019

I'm not sure I follow. You don't have a choice of WARM or COLD with S2. There is HOT; briefly there is WARM while waiting to upload to remote; and finally there is remote with cached local copies. Th entire bucket lifecycle changes.

At least this is my understanding.

davidpaper · ‎04-08-2019

S2 behaviors in no particular order. I will update this post as new information is learned.

RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
S2 cachemanager will download components of a bucket as searches determine what’s needed. Maybe bloomfilters, deletes, journal.* or other components, and as such multiple downloads for the same bucket may look like they are happening, but per component, no duplicate downloads should happen.
Evictions don’t always seem to show up in MC on the S2 pages. The following will.
index=_internal sourcetype=splunkd source=*splunkd.log action=evictDeletes
Starting in 7.2.4, additional metrics were added to be able to count downloaded byte count. Prior to this version, Splunk was metrics-blind to the (potentially significant) impact on the network/storage a rolling restart induces.
During a rolling restart, as each indexer is marked to go down
CM begins to reassign primacy for buckets on the indexer on the way down to other indexers
All buckets on indexer being restarted are marked for eviction, effectively flushing the cache on the indexer being restarted
As indexers in the cluster are restarted, others will start d/ling buckets from S3 to satisfy search requests, which can take a heavy toll on local network and storage if not prepared for this level of data transfer in a short period of time, as all other indexers not being restarted will likely start requesting buckets to download at once.
SmartStore only allows one indexer at a time to be primary searchable for a bucket and no other indexers are allowed to have copies of that bucket cached. The CM will issue eviction notices to any indexers with copies of that bucket locally. This ensures that only 1 indexer will search that bucket and return results. As a result of this, there is a huge amount of data shuffling and downloading that happens during a full cluster rolling restart.
Bucket rebalance works more quickly with S2 than without it because the only buckets to rebalance are hot buckets

Added Nov 2019

Disk part 1: S2 disk I/O requirements seem to be higher than non-S2, due to the bucket downloading process needing to be able to write large amounts of data quickly as cachemanager populates buckets for search. Default downloading config allows for 8 simultaneous downloads at once. Disks previously able to shoulder the load may not be up to the task of S2’s caching requirements. I'm looking at you, RAID5 volumes. By definition it's cache space (and hot bucket space, but hot is replicated), so use RAID0 (stripe) for the fastest disk possible, and not waste a MB of available disk space. RAID10 (mirrored stripes) is also acceptable, but cuts usable disk space by 50%.
Disk part 2: To expand on the above a bit, S2 performance is more than just high IOPS, it's about throughput too. Customers running S2 in AWS that have chosen to use gp2 EBS volumes for hot/cachemanager will likely see severe IO contention resulting in IO wait % jumping during high periods of S2 bucket downloads from remote storage. This is quite easy to see in top or iostat when users run searches that trigger large bucket evictions & bucket downloads from remote storage. gp2 has a limit of 250MB/sec, which doesn't take long to hit when the network is 10 gig or faster. Yes, a fast network means data written to kernel buffer cache at a high rate and when its time to sync to disk, the storage won't be able to keep up. io1 EBS type is better, at 1000MB/s, but still can exhaust throughput capacity during periods of concurrent high bucket downloads and search that taxes the storage for both reads and writes in addition to ingestion and hot bucket replication. In AWS, it is highly recommended to use NVME for hot/cachemanager (i3 and i3en instance types work very well here) in RAID 0 and consider setting RF/SF=3 (still applies to hot buckets) to sleep better at night.
Disk part 3: If deploying S2 outside of AWS, strive to obtain the fastest disks (throughput & IOPS) available, whether local SSDs or NVME to avoid storage bottlenecks getting in the way of your Splunk performance.

ypeng_splunk · ‎11-11-2019

Hi David, this is a great session.
Today, one Splunk instance identified some issues with smartstore on top of on-prem object storage. It worked normal since smartstore was enabled several months ago. Most of the time, the indexing rate per indexer is about 8-10MB/s. But, while there was a spike (not sure how much yet), indexer processor was stuck and consuming 100% CPU on indexer. All pipelines were blocked and couldn't be recovered. Indexing rate dropped to 2MB/s. They restarted the indexer. It went back to normal with index rate of 16MB/s.
Around 20min before the congestion, Some errors like "DatabaseDirectoryManager - failed to open bucket/waif for bucket to be local through CacheManager" started to be reported by indexer.
Their hot buckets are on SSD without RAID.

Any thought on this case?

ypeng_splunk · ‎11-11-2019

MC showed the major cause was ChillOrFreeze in indexer. But, the total data stored in smartstore was way below the maxGlobalDataSizeMB.

woodcock · ‎11-22-2019

There was a very serious bug in the SmartStore code that caused buckets to be accidentally deleted. See the (absurdly vague) headline regarding might impact data durability in certain rare ... in Fixed Issues here:
https://docs.splunk.com/Documentation/Splunk/7.2.4/ReleaseNotes/Fixedissues

jamie00171 · ‎10-21-2019

In my logs I see "deletes" files being downloaded, what is the deletes file in the bucket used for? Thanks

davidpaper · ‎10-21-2019

That file is where the info is stored to block events from showing up in search that have had "|delete" run against them in the past.

twinspop · ‎04-08-2019

This is a really good rundown for anyone planning to use S2. Thanks for the summary @davidpaper!

woodcock · ‎06-12-2019

@SloshBurch - We need a best practice wizard in here.

sloshburch · ‎07-16-2019

Thanks @woodcock. I hope to tackle smartstore soon and will revisit this at this time.

SmartStore Behaviors

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

SmartStore Behaviors

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard