Getting Data In

Storage experts: With 20 SSDs per indexer, what's the best RAID option?

twinspop
Influencer

These will be running SUSE 12. Each SSD will be 1.6TB. The systems have hardware RAID cards, but I'm tempted to go with JBOD, and use Linux tools or even ZFS to manage the volumes.

  • RAID50? eg, RAID5 with 5 members, 4 groups, striped
  • RAID60?
  • Multiple RAIDZ1 or -2 with ZFS?

Our storage group recommended one giant RAID5 volume, which worries me. Rebuild on a volume that size seems to be a problem, and losing a second drive during rebuild would be a real possibility. Not to mention having 1 drive failure protection in a 20 drive array seems like a bad idea.

EDIT - I'm trying to avoid RAID10, losing 50% of the raw storage.

0 Karma
1 Solution

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

View solution in original post

twinspop
Influencer

Follow-up: RAID5 was okay at first, but the relatively poor IO perf caught up with us. Eventually I had to re-create the volumes as RAID10. SmartStore made this fairly easy. We just updated these servers and went with fewer drives in RAID0, relying on remote storage (S2) and clustering for all redundancy.

0 Karma

masonmorales
Influencer

If you're interested in performance differences, you can check out the .Conf 2016 talk I did, "Architecting Splunk for Epic Performance at Blizzard Entertainment" at https://conf.splunk.com/sessions/2016-sessions.html

0 Karma

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

twinspop
Influencer

We are clustered. Currently 5 (in 2 different clusters). Soon to be 12 each. Thanks for your input!

0 Karma

masonmorales
Influencer

What's your RF/SF?

0 Karma

twinspop
Influencer

For this project we plan to be RF3/SF2.

0 Karma
Get Updates on the Splunk Community!

Let’s Talk Terraform

If you’re beyond the first-weeks-of-a-startup stage, chances are your application’s architecture is pretty ...

Cloud Platform | Customer Change Announcement: Email Notification is Available For ...

The Notification Team is migrating our email service provider. As the rollout progresses, Splunk has enabled ...

Save the Date: GovSummit Returns Wednesday, December 11th!

Hey there, Splunk Community! Exciting news: Splunk’s GovSummit 2024 is returning to Washington, D.C. on ...