Archive

Filesystem for Splunk

Contributor

so, under a rhel6 and latest splunk, and likely sitting on 16 10k spindles raid-5, is there a filesystem best suited for the job given the fact that both syslog-ng files and splunk indexes are on the same filesystem?

Tags (1)
0 Karma

SplunkTrust
SplunkTrust

(started as a comment, but couldn't fit in 500 chars... sorry)

A very quick read suggests that the cachecade may not provide the desired boost to Splunk's workload. Per Dell's documentation it is a read cache only. It may not help substantially with the RAID-5 performance penalty for less than full-stripe writes. Splunk deals well with cold buckets on RAID-5 (because cold buckets are read-only), but much less so with hot - because there is a lot of random writes to update both rawdata and index tsidx data files.

To optimize for performance and capacity, my personal preference would be to put all 16 drives in the R720, and use 8 in a (4+4) RAID-10 for the hot buckets and 8 in a (7+P) RAID-5 for cold buckets / operating system. Use Linux LVM to help sort out filesystems appropriately. If you really want to use the cachecade, put it on the RAID-5 volume and let the RAID-10 be dealt with using operating system filesystem cache in RAM.

Contributor

so, 2.4T raid-10 for hot buckets, and 5+P+cachecade (need one spare) 3T raid-5 for OS and cold buckets. problem is, i have syslog on same system and i expect 60/40 write:read ratio between syslog writing and Splunk reading those files, so i suspect syslog has to go on raid-10. 2.4T wont be enough for my syslog data. i also plan to do some ext4 filesystem tuning, like noatime, data=ordered, nouser_xattr, which saves some writes.

0 Karma

Legend

What dwaddle said!! Thanks dude 🙂

0 Karma

Legend

First and foremost - NOT RAID-5

Raid 10 (1+0) would be much much better. Normal RHEL filesystem s/b fine.

Very good info in first part of Installation manual, and in wiki:

http://wiki.splunk.com/Community:HardwareTuningFactors

http://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements

Contributor

if i run crude calc using 1/3 read and 2/3 writes in raid-5 w/ 1k IOPS as my target for operating IOPS, i get something like this:
(1000*0.333)+(1000*0.666*4)
333+2664 = 2997 raw array IOPS to realize 1000 in raid-5

14*130*cachecade factor = 1820*3.5 = 6370

so, probably not 20k, but if the OS can "see" 6370 IOPS in a raid-5 split 33% read and 66% write, does my original statement still hold, does it really matter that its raid-5?

0 Karma

Contributor

in this specific case i am referencing Dell's R720 with one SSD CacheCade, tests ran in raid-10 using 15k drives yielded 32k+ IOPS. its a skewed # due to CacheCade and it all depends. 20k was my estimate based on same tests using 10k drives.

0 Karma

Contributor

How can you get 20000+ IOPS from 10K spindles? Did you mean SSD?

0 Karma

Contributor

thanks for the info, but i am not 100% convinced that raid-5 is a bad choice. raid-5 is certainly not better than a flavor of raid-10, but if a 14 spindle raid-5 array offers ~20000+ IOPS does it really matter that it's a raid-5? the cons for flavor of raid-10 is that you lose half the storage capacity. pros and cons to each, but i am not ready to say raid-5 is a bad choice.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!