Recent Splunk versions include many acceleration technologies to speed up common search scenarios using technologies like summary indexing (3.1?), bloom filters (4.3), report acceleration (5.0), and accelerated data models (6.0). All of these speedup techniques have a different sweet spot and still provide value today. Fundamentally, they all trade some additional storage for really fast search performance.
Fortunately, Splunk allows the admin to control where all this additional storage gets placed on the Indexer via the indexes.conf
file. However, this does make estimating disk usage and determining what type of data should be placed on the fastest storage a difficult thing to plan.
From a storage perspective, Summary indexing is just a special-purpose index, so there's not much new to calculate there. So the focus of my question is on the Splunk search performance features in Splunk 4.3 or later.
Path related index.conf
settings:
Setting | Purpose | Advantage of fast storage |
homePath | Hot/Warm storage | Recent events are available more quickly. |
coldPath | Cold storage | Historic searches are quicker. |
bloomHomePath | Bloom filters | ? |
summaryHomePath | Report Acceleration | ? |
tstatsHomePath | Data model Acceleration | ? |
Splunk and SSDs
Now that SSD are becoming more economical with very clear performance advantages it makes sense to incorporate them into a Splunk system. But the cost is still high enough that hybrid SSD/HHD approach still provides a better retention and speed combination. So my question is two fold:
My initial thought was simple. Stick hot/warm data on SSDs and place the cold data on the HHDs. I think that makes sense, but then question I had was what "auxiliary" data (bloom filters, summary dat, tstats?) would benefit the most from faster storage? Real-life experience is preferred, but general insights into the typical I/O usage patterns would be helpful too.
Here is an excellent slide deck that covers the most recent architecture advances in Splunk (e.g. clustering) with side-focus on how SSDs best fit into each:
Yes better get the searching part done on SSD, but again how do we know which to keep in SSD and which one to HDD!! Mostly we will search on the recent data, where as the recent data keeps on updating in the Hot buckets! Again we are looped to the start of the discussion where to use it!!
Writing to a Solid State Drive constantly can have some negative affects too, it greatly reduces the life of them by wearing them out.
The response for http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk was written in January of 2011, when the sequential performance of SSD's wasn't particularly better than spindle disks ("only" up to about 2x as fast). However, modern PICe based SSD's can transfer at substantially higher rates sequentially today (5x-10x as fast) as a single spindle disk today, so that comment may not be valid anymore.
Thanks aelliot, the "[Bloom Filters are] 50-100x faster on conventional storage, >1000x faster on SSD" is good to know. As for the other answer, I was hoping to get an updated take on that (since it was written in 2011). Given that SSD prices are falling, and some RAID controllers were getting in the way of performance... I was hoping to get some recent feedback from people actually using them. Thanks!
this ppt says that they would be 1000x faster on ssd
http://blogs.splunk.com/wp-content/uploads/2011/07/SplunkSuperchargeYourSearchesWorkshop.pptx
Wheras this post says normal searches won't get a lot of increase in speed: http://answers.splunk.com/answers/10417/splunk-on-solid-state-disk
Yeah I had read over that. In fact that's part of where this question came from. 😉 I'm assuming they had the bloom filters on SSD. But what does performance look like if on an historic search if cold storage is on HHDs and only the bloom filters are on SSDs?
Check out this blog post: http://blogs.splunk.com/2012/05/10/quantifying-the-benefits-of-splunk-with-ssds/