Getting Data In

Storage experts: With 20 SSDs per indexer, what's the best RAID option?

twinspop
Influencer

These will be running SUSE 12. Each SSD will be 1.6TB. The systems have hardware RAID cards, but I'm tempted to go with JBOD, and use Linux tools or even ZFS to manage the volumes.

  • RAID50? eg, RAID5 with 5 members, 4 groups, striped
  • RAID60?
  • Multiple RAIDZ1 or -2 with ZFS?

Our storage group recommended one giant RAID5 volume, which worries me. Rebuild on a volume that size seems to be a problem, and losing a second drive during rebuild would be a real possibility. Not to mention having 1 drive failure protection in a 20 drive array seems like a bad idea.

EDIT - I'm trying to avoid RAID10, losing 50% of the raw storage.

0 Karma
1 Solution

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

View solution in original post

twinspop
Influencer

Follow-up: RAID5 was okay at first, but the relatively poor IO perf caught up with us. Eventually I had to re-create the volumes as RAID10. SmartStore made this fairly easy. We just updated these servers and went with fewer drives in RAID0, relying on remote storage (S2) and clustering for all redundancy.

0 Karma

masonmorales
Influencer

If you're interested in performance differences, you can check out the .Conf 2016 talk I did, "Architecting Splunk for Epic Performance at Blizzard Entertainment" at https://conf.splunk.com/sessions/2016-sessions.html

0 Karma

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

twinspop
Influencer

We are clustered. Currently 5 (in 2 different clusters). Soon to be 12 each. Thanks for your input!

0 Karma

masonmorales
Influencer

What's your RF/SF?

0 Karma

twinspop
Influencer

For this project we plan to be RF3/SF2.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...