How can an Indexer best utilize a combination of S...

Lowell · ‎12-16-2013

Recent Splunk versions include many acceleration technologies to speed up common search scenarios using technologies like summary indexing (3.1?), bloom filters (4.3), report acceleration (5.0), and accelerated data models (6.0). All of these speedup techniques have a different sweet spot and still provide value today. Fundamentally, they all trade some additional storage for really fast search performance.

Fortunately, Splunk allows the admin to control where all this additional storage gets placed on the Indexer via the indexes.conf file. However, this does make estimating disk usage and determining what type of data should be placed on the fastest storage a difficult thing to plan.

From a storage perspective, Summary indexing is just a special-purpose index, so there's not much new to calculate there. So the focus of my question is on the Splunk search performance features in Splunk 4.3 or later.

Path related index.conf settings:

Setting	Purpose	Advantage of fast storage
`homePath`	Hot/Warm storage	Recent events are available more quickly.
`coldPath`	Cold storage	Historic searches are quicker.
`bloomHomePath`	Bloom filters	?
`summaryHomePath`	Report Acceleration	?
`tstatsHomePath`	Data model Acceleration	?

Splunk and SSDs

Now that SSD are becoming more economical with very clear performance advantages it makes sense to incorporate them into a Splunk system. But the cost is still high enough that hybrid SSD/HHD approach still provides a better retention and speed combination. So my question is two fold:

Which of the above acceleration techniques are most well suitable for fast storage? (Specifically, storage with high IOPS provided by an SSD)
What's a good way to estimate the size requirements for these different acceleration techniques?

My initial thought was simple. Stick hot/warm data on SSDs and place the cold data on the HHDs. I think that makes sense, but then question I had was what "auxiliary" data (bloom filters, summary dat, tstats?) would benefit the most from faster storage? Real-life experience is preferred, but general insights into the typical I/O usage patterns would be helpful too.

woodcock · ‎09-01-2015

Here is an excellent slide deck that covers the most recent architecture advances in Splunk (e.g. clustering) with side-focus on how SSDs best fit into each:

http://www.slideshare.net/Splunk/taking-splunk-to-the-next-level-architecture-breakout-session-48015...

linu1988 · ‎01-08-2014

Yes better get the searching part done on SSD, but again how do we know which to keep in SSD and which one to HDD!! Mostly we will search on the recent data, where as the recent data keeps on updating in the Hot buckets! Again we are looped to the start of the discussion where to use it!!

aelliott · ‎01-08-2014

Writing to a Solid State Drive constantly can have some negative affects too, it greatly reduces the life of them by wearing them out.

jonvel · ‎01-08-2014

The response for http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk was written in January of 2011, when the sequential performance of SSD's wasn't particularly better than spindle disks ("only" up to about 2x as fast). However, modern PICe based SSD's can transfer at substantially higher rates sequentially today (5x-10x as fast) as a single spindle disk today, so that comment may not be valid anymore.

Lowell · ‎12-17-2013

Thanks aelliot, the "[Bloom Filters are] 50-100x faster on conventional storage, >1000x faster on SSD" is good to know. As for the other answer, I was hoping to get an updated take on that (since it was written in 2011). Given that SSD prices are falling, and some RAID controllers were getting in the way of performance... I was hoping to get some recent feedback from people actually using them. Thanks!

aelliott · ‎12-16-2013

this ppt says that they would be 1000x faster on ssd
http://blogs.splunk.com/wp-content/uploads/2011/07/SplunkSuperchargeYourSearchesWorkshop.pptx

Wheras this post says normal searches won't get a lot of increase in speed: http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk

Lowell · ‎12-16-2013

Yeah I had read over that. In fact that's part of where this question came from. 😉 I'm assuming they had the bloom filters on SSD. But what does performance look like if on an historic search if cold storage is on HHDs and only the bloom filters are on SSDs?

aelliott · ‎12-16-2013

Check out this blog post: http://blogs.splunk.com/2012/05/10/quantifying-the-benefits-of-splunk-with-ssds/

How can an Indexer best utilize a combination of SSD/HDD storage?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Accelerating Observability as Code with the Splunk AI Assistant

Join the Conversation

How can an Indexer best utilize a combination of SSD/HDD storage?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Accelerating Observability as Code with the Splunk AI Assistant