Monitoring Splunk

What is the best storage solution for optimal Splunk performance?

mctester
Communicator

When planning my Splunk deployment, I've been told that the storage volume is probably the most important aspect. Why is this and what is the recommended hardware?

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

We recommend storage that provides a very high number of random input/output operations per second (IOPS). Storage bandwidth is less of a consideration as almost any hardware is capable of providing the required throughput.

Striped disks provide high IOPS because requests are likely to be distributable over a greater number of disk spindles.

We recommend RAID 10 storage for the Splunk hot/warm index volumes as it provides the high IOPS achievable by striping over many disks while mirroring reduces the disk of data loss due to single-drive failures. Mirroring may also provide higher read IOPS.

RAID 5 storage is perfectly acceptable for cold volumes. While RAID 5 and (RAID 6) often has greatly reduced IOPS on writes and is therefore unsuitable for the hot volume, this is less of a problem with cold volumes which are written only rarely, and usually only in large blocks.

Improving IOPS with the use of disk caches may be possible, but it should be noted that small caches will likely be completely ineffective because of the large volumes of data that are typically involved in Splunk systems (at least, on those Splunk systems where the volumes are large enough where we would take such considerations into account).

Note also that we also recommend using multiple Splunk indexers and distributing data and searches over them to improve search performance. It should be more cost-effective to achieve high Splunk performance by using many machines with "good" storage, than just a single with "absolute fastest" storage possible.

View solution in original post

Queboduck
Engager

For storage performance from a physical hardware perspective I would provide a few recommendations, most of them pretty straight forward.

  1. Consider Flash Storage - somewhat obvious, but with costs decreasing this is an easy way to improved storage performance. Blizzard did a really good, in depth analysis of the effects of flash and their testing in their direct attached storage (DAS) based environment at .conf 2016. Blizzard also noted that Bonnie++ is not necessarily the best indicator of storage performance.
  2. Consider Scale-out Software Defined Storage (SDS) - one way to potentially improve storage efficiency, simplify management, and improve performance is spread IO across a bunch of different spindles managed by an SDS. Dell EMC has done work with Splunk to validate ScaleIO and vSAN on their platforms and the whitepapers are posted on [Splunk Partner Site][1]
  3. Be Aware of Physical SAN Architecture - Splunk is a Scale-out application and if you are using physical SAN, even with Flash technology, have a good understanding of your storage architecture and utilization. Is your SAN architecture Scale-out or Scale-up? Scale-up SAN is NOT bad, far from it, there are a number of very large deployments of Splunk that I know of that have been deployed on Scale-up SAN and it works really well. The thing you need to be cognizant of, in addition to the media and back end IO, is the front end capabilities of your fabric and service processors also need to be considered.

gkanapathy
Splunk Employee
Splunk Employee

We recommend storage that provides a very high number of random input/output operations per second (IOPS). Storage bandwidth is less of a consideration as almost any hardware is capable of providing the required throughput.

Striped disks provide high IOPS because requests are likely to be distributable over a greater number of disk spindles.

We recommend RAID 10 storage for the Splunk hot/warm index volumes as it provides the high IOPS achievable by striping over many disks while mirroring reduces the disk of data loss due to single-drive failures. Mirroring may also provide higher read IOPS.

RAID 5 storage is perfectly acceptable for cold volumes. While RAID 5 and (RAID 6) often has greatly reduced IOPS on writes and is therefore unsuitable for the hot volume, this is less of a problem with cold volumes which are written only rarely, and usually only in large blocks.

Improving IOPS with the use of disk caches may be possible, but it should be noted that small caches will likely be completely ineffective because of the large volumes of data that are typically involved in Splunk systems (at least, on those Splunk systems where the volumes are large enough where we would take such considerations into account).

Note also that we also recommend using multiple Splunk indexers and distributing data and searches over them to improve search performance. It should be more cost-effective to achieve high Splunk performance by using many machines with "good" storage, than just a single with "absolute fastest" storage possible.

jrodman
Splunk Employee
Splunk Employee

To add color here: many searches are seek-dominated (needle-in-haystack), which means you want a lot of IOPS. For the storage system where you are indexing, the data is actually written multiple times, so the high cost-per-write of raid5 with the desire for low latency searches on the same disk is not ideal.

Mick
Splunk Employee
Splunk Employee

The faster your storage, the faster Splunk will be able to index and search data. A high i/o capability is a must. Deployment docs recommend 10K disks with RAID 10, but if you can get a faster SAN then that will work too.

Steer well clear of RAID 5, it just doesn't perform well

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...