Thanks for the quick replies! Sorry I didn't supply all the details first.
The exact requirements are 1 year quick(er) response (warm) to searches and another 2-3 years of slow(er) response to searches (cold). Another 5-6 years of frozen that has to be thawed before it can be used. So yes, we are expected to be able to search up to 4 years without thawing.
The searchable raw data sets will be 12TB warm, 24-36TB cold. Using the sizing tool, the actual estimated used physical space will be 6-7TB warm, and 2-3x that for the cold. I'm expecting to use either SSD or fast spinning for the warm, and dedicated NFS for the Cold and frozen. ... assuming fast NFS is good enough for cold, given light searching.
I've read that SSD (for the warm space) would be great for the sparse searches, it would depend on if its cost effective if we don't search much. its not supposed to be an every day tool, but used in response to incidents. A few ad hoc searches (of the entire warm space) need to be completed in hours not days. Searches in the cold space that take days (not weeks) are acceptable as well. (...Of course that is until they discover the usefulness of the tool and change their minds on how much they use it.... 😉
So, if I'm clear on the overhead for indexing alone in our use case;
1) We have to write 35GB/day (raw) to memory (hot).
2) It is compressed and indexed to ~18GB/day and written to warm.
3) Data that is 1 year old (~18GB/day) has to be found, read and re-written/re-indexed to cold.
4) Data that is 4 years old (~18GB/day) has to be found, frozen and moved to frozen space.
So both the warm and cold space have to write (including deletes) 36GB/day each. This seems trivial in our case because its a smaller daily data set.
I'm thinking the real overhead is searching the indices for the oldest data to move? I'm assuming that happens multiple times a day, Thus the bigger the space we scan, the more work is required. Will 900 IOPs cover it for either warm or cold, given a 20TB cold space and a 7TB warm space? Will a single "reference" hardware indexer handle the CPU requirements? Will it have enough left over to handle light/slow searches?
I have a side question. If someone is focusing their search on 3 year old data, it wont be moved into warm space automatically?
I'm hoping not as that would bump the newer data to cold for a short term gain.
Thanks again for all your thoughts.
... View more
We are intending to input about 35GB/day into Splunk enterprise. That can easily be handled by a single "reference" hardware indexer, even with some searching. However, our retention time is very long in comparison to what I've read on most use cases. We intend to have 1 year of hot (12TB/yr) and 2-3 years of cold (24-36TB)
I've read that there is significant overhead in just indexing and holding large amounts of data, without even searching it.
Is there a formula for how many "reference" indexers are required (per TB) to just hold data?
Note that we do not intend to search it often, it is mainly intended for incidence response. Of course we need to consider the searching (and are looking at SSD for the hot), but that would be on top of what we need just to hold it.
... View more