Getting Data In

If we plan to ingest roughly 10 GB of logs daily, how much extra data should I consider being added to indexing done by the Splunk server?

jonvel
Explorer

As we begin to plan out our deployment of Splunk, one thing is starting to puzzle me, and this is mostly a "should we consider SSD's" type of question. Based on some test data available elsewhere, even consumer level SSD's are capable of writing up to 2+ Petabytes of information before they die.

So, if our plan is to ingest roughly 10 GB of logs into our Splunk service daily. How much extra data should I consider added by indexing, etc done by the Splunk server? An extra 5 GB? 10? 20?

I realize that this is probably a good "it depends" type of answer, but are there any rough ballpark figures to go on? I was not able to find sizing information from the splunk site (other than minimum IOPS to effectively run Splunk). If indexing is minimal (or even equivalent) to the ingest rate, that's still 15GB, 20 or 30 GB per day, something that a "decent" even consumer level SATA SSD can write for decades before wearing out. Even 1TB of extra writes per day of log ingest is "fine", I just have to plan for replacing SSD's more often. Given that SSD's are incredibly cheap for the performance, and given that I already know that SSD's do eventually wear out (though so do HDD's), I'd like to have a plan for "regular" scheduled replacement plans for our storage infrastructure. Like "plan on replacing at a rate of x per year". We do plan on investing in more business/enterprise grade storage, though. I was just using a consumer SSD as a baseline.

Does anyone use SSD's for their entire storage array for Splunk? I imagine this would be a small-ish size (10 GB per day isn't that many logs), so getting something approaching the SSD performance, but in an HDD format means spending a lot of money on massive spindle arrays that I just don't think we can justify.

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You're looking for this: http://docs.splunk.com/Documentation/Splunk/6.2.4/Capacity/HowSplunkcalculatesdiskstorage

In a nutshell, you can roughly expect 5GB of disk space taken up per day of data retention at 10GB incoming data. Yes, that's less than the daily volume.
Some data types will yield much less, some will yield much more - if you need a more precise answer you should take a machine you have sitting around - can even be a laptop - plonk a trial Splunk on it and start indexing for a week or so. Then see what you have on disk and do the maths specific to your data.

sk314
Builder

In addition, look under the "Storage requirement examples" section at the following link for splunk indexer clusters.: http://docs.splunk.com/Documentation/Splunk/6.2.4/Indexer/Systemrequirements

jonvel
Explorer

While that's a great measure of how much ultimate space I need, and answers that part of the question, the other part is, given that SSD's wear out over time, and given that's a function of how much data you write to it (even deleting and overwriting), what's the expectation of how that data is written to? Is it a "write once and forget about it" or does splunk constantly update the indexes?

0 Karma

dflodstrom
Builder

It is often advised to store your hot/warm buckets on the fastest storage available and cold/frozen buckets can be moved to cheaper storage. This is a popular approach in situations where logs must be retained for a long time but are only searched against over a period of 90 days or 1 year...maybe more or less depending on the situation and available resources.

0 Karma

ppablo
Retired

Hi @jonvel

This isn't my area of expertise, but a resource I've seen tossed around for sizing estimates is https://splunk-sizing.appspot.com

It wasn't created by Splunk, but still a useful tool to play around with for planning.

Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...