We are planning to ingest close to 100GB/day for the next 2 years. Eventually, we are estimating that the ingestion count shall touch 300/GB day. The requirement is to have data available/online for 90 days (hot/warm/cold) while all data older than 90 days shall be frozen and archived for 10 years. Yes, that is a major storage requirement, but given that our Splunk components shall be setup in the AWS Cloud environment, we do have the ability to scale up on storage over time.
We have 4 indexers (installed on RHEL Linux instances) in a cluster with specs of 16 CPU and 64GB RAM each. The indexers are expected to achieve at least 800 IOPS at full capacity, but not much more, at least for now since we are only planning to ingest 100 GB/day. The plan is to store the hot, warm, and cold buckets on a single volume while the frozen/archived data shall be restored on a second volume.
Based on these details, which RAID level and count of disks per volume would be recommended for this setup?
only one hint, don't use RAID5 because is less performant than RAID 1 or RAID1+0: this is the indicazioni of Splunk Support.
You can choose between RAID1 or RAID1+0 OR RAID 0 depending on the HA requirements and the features of the host provider: on premise I always use RAID1+0, in cloud you have only to be sure to have redundancy.