Hi David, this is a great session.
Today, one Splunk instance identified some issues with smartstore on top of on-prem object storage. It worked normal since smartstore was enabled several months ago. Most of the time, the indexing rate per indexer is about 8-10MB/s. But, while there was a spike (not sure how much yet), indexer processor was stuck and consuming 100% CPU on indexer. All pipelines were blocked and couldn't be recovered. Indexing rate dropped to 2MB/s. They restarted the indexer. It went back to normal with index rate of 16MB/s.
Around 20min before the congestion, Some errors like "DatabaseDirectoryManager - failed to open bucket/waif for bucket to be local through CacheManager" started to be reported by indexer.
Their hot buckets are on SSD without RAID.
Any thought on this case?
... View more