Getting Data In
Highlighted

How can an Indexer best utilize a combination of SSD/HDD storage?

Super Champion

Recent Splunk versions include many acceleration technologies to speed up common search scenarios using technologies like summary indexing (3.1?), bloom filters (4.3), report acceleration (5.0), and accelerated data models (6.0). All of these speedup techniques have a different sweet spot and still provide value today. Fundamentally, they all trade some additional storage for really fast search performance.

Fortunately, Splunk allows the admin to control where all this additional storage gets placed on the Indexer via the indexes.conf file. However, this does make estimating disk usage and determining what type of data should be placed on the fastest storage a difficult thing to plan.

From a storage perspective, Summary indexing is just a special-purpose index, so there's not much new to calculate there. So the focus of my question is on the Splunk search performance features in Splunk 4.3 or later.

Path related index.conf settings:

Setting Purpose Advantage of fast storage
homePath Hot/Warm storage Recent events are available more quickly.
coldPath Cold storage Historic searches are quicker.
bloomHomePath Bloom filters ?
summaryHomePath Report Acceleration ?
tstatsHomePath Data model Acceleration ?

Splunk and SSDs

Now that SSD are becoming more economical with very clear performance advantages it makes sense to incorporate them into a Splunk system. But the cost is still high enough that hybrid SSD/HHD approach still provides a better retention and speed combination. So my question is two fold:

  1. Which of the above acceleration techniques are most well suitable for fast storage? (Specifically, storage with high IOPS provided by an SSD)
  2. What's a good way to estimate the size requirements for these different acceleration techniques?

My initial thought was simple. Stick hot/warm data on SSDs and place the cold data on the HHDs. I think that makes sense, but then question I had was what "auxiliary" data (bloom filters, summary dat, tstats?) would benefit the most from faster storage? Real-life experience is preferred, but general insights into the typical I/O usage patterns would be helpful too.

Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Motivator
Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Super Champion

Yeah I had read over that. In fact that's part of where this question came from. 😉 I'm assuming they had the bloom filters on SSD. But what does performance look like if on an historic search if cold storage is on HHDs and only the bloom filters are on SSDs?

Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Motivator

this ppt says that they would be 1000x faster on ssd
http://blogs.splunk.com/wp-content/uploads/2011/07/SplunkSuperchargeYourSearchesWorkshop.pptx

Wheras this post says normal searches won't get a lot of increase in speed: http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk

Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Super Champion

Thanks aelliot, the "[Bloom Filters are] 50-100x faster on conventional storage, >1000x faster on SSD" is good to know. As for the other answer, I was hoping to get an updated take on that (since it was written in 2011). Given that SSD prices are falling, and some RAID controllers were getting in the way of performance... I was hoping to get some recent feedback from people actually using them. Thanks!

Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Explorer

The response for http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk was written in January of 2011, when the sequential performance of SSD's wasn't particularly better than spindle disks ("only" up to about 2x as fast). However, modern PICe based SSD's can transfer at substantially higher rates sequentially today (5x-10x as fast) as a single spindle disk today, so that comment may not be valid anymore.

Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Motivator

Writing to a Solid State Drive constantly can have some negative affects too, it greatly reduces the life of them by wearing them out.

0 Karma
Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Champion

Yes better get the searching part done on SSD, but again how do we know which to keep in SSD and which one to HDD!! Mostly we will search on the recent data, where as the recent data keeps on updating in the Hot buckets! Again we are looped to the start of the discussion where to use it!!

0 Karma
Highlighted

Re: How can an Indexer best utilize a combination of SSD/HDD storage?

Esteemed Legend

Here is an excellent slide deck that covers the most recent architecture advances in Splunk (e.g. clustering) with side-focus on how SSDs best fit into each:

http://www.slideshare.net/Splunk/taking-splunk-to-the-next-level-architecture-breakout-session-48015...

0 Karma