As we begin to plan out our deployment of Splunk, one thing is starting to puzzle me, and this is mostly a "should we consider SSD's" type of question. Based on some test data available elsewhere, even consumer level SSD's are capable of writing up to 2+ Petabytes of information before they die.
So, if our plan is to ingest roughly 10 GB of logs into our Splunk service daily. How much extra data should I consider added by indexing, etc done by the Splunk server? An extra 5 GB? 10? 20?
I realize that this is probably a good "it depends" type of answer, but are there any rough ballpark figures to go on? I was not able to find sizing information from the splunk site (other than minimum IOPS to effectively run Splunk). If indexing is minimal (or even equivalent) to the ingest rate, that's still 15GB, 20 or 30 GB per day, something that a "decent" even consumer level SATA SSD can write for decades before wearing out. Even 1TB of extra writes per day of log ingest is "fine", I just have to plan for replacing SSD's more often. Given that SSD's are incredibly cheap for the performance, and given that I already know that SSD's do eventually wear out (though so do HDD's), I'd like to have a plan for "regular" scheduled replacement plans for our storage infrastructure. Like "plan on replacing at a rate of x per year". We do plan on investing in more business/enterprise grade storage, though. I was just using a consumer SSD as a baseline.
Does anyone use SSD's for their entire storage array for Splunk? I imagine this would be a small-ish size (10 GB per day isn't that many logs), so getting something approaching the SSD performance, but in an HDD format means spending a lot of money on massive spindle arrays that I just don't think we can justify.
You're looking for this: http://docs.splunk.com/Documentation/Splunk/6.2.4/Capacity/HowSplunkcalculatesdiskstorage
In a nutshell, you can roughly expect 5GB of disk space taken up per day of data retention at 10GB incoming data. Yes, that's less than the daily volume.
Some data types will yield much less, some will yield much more - if you need a more precise answer you should take a machine you have sitting around - can even be a laptop - plonk a trial Splunk on it and start indexing for a week or so. Then see what you have on disk and do the maths specific to your data.
While that's a great measure of how much ultimate space I need, and answers that part of the question, the other part is, given that SSD's wear out over time, and given that's a function of how much data you write to it (even deleting and overwriting), what's the expectation of how that data is written to? Is it a "write once and forget about it" or does splunk constantly update the indexes?
It is often advised to store your hot/warm buckets on the fastest storage available and cold/frozen buckets can be moved to cheaper storage. This is a popular approach in situations where logs must be retained for a long time but are only searched against over a period of 90 days or 1 year...maybe more or less depending on the situation and available resources.
This isn't my area of expertise, but a resource I've seen tossed around for sizing estimates is https://splunk-sizing.appspot.com
It wasn't created by Splunk, but still a useful tool to play around with for planning.