Deployment Architecture

EBS backed storage vs local storage with index replication on Amazon Web Services?

lquinn
Contributor

I am looking to move my splunk environment onto Amazon Web Services. I have read through a lot of the documentation regarding recommended hardware for my Splunk ec2 Instances. It seems there are two options with regards to backing up data and disaster recovery.

Use c4.??xlarge instances with EBS backed storage or d2.??xlarge instances with HDD based local storage and index replication between the instances.

Data loss prevention is extremely important to me and is my highest priority here. Can anyone give me any recommendations/ pros and cons of each method? Thanks!

1 Solution

emiller42
Motivator

We're actually evaluating the same thing. Here are the quick bullet points I've come up with so far:

C4 + EBS volumes:

Pros:

  • Persistent storage. (Can be detached and reattached to the same instance, or a new one)
  • High IOPS. EBS volumes are SSD backed. Even without using Provisioned IOPS, you get very good numbers here. (We see approx 3000 random seeks/sec from a 5TB EBS store)

Cons:

  • (Comparatively) Low throughput. At best, you're going to get 500MB/sec read/write if you're using a c4.8xl. Thankfully, they're EBS optimized by default, which is a requirement, IMO. (otherwise your incoming data and disk IO are competing for throughput) In our experience, this has been a bottleneck, and a primary reason I'm looking at the d2's myself.
  • Cost. EBS volumes are expensive, and they don't benefit from the kinds of savings you get from reserved instances.
  • Half the RAM of the D2's.

D2s:

Pros:

  • WAY more storage capacity for your dollar.
  • Double the RAM of the c4's
  • Much better throughput. Bonnie++ testing of a d2.4xl with the ephemeral in a raid 0 showed sequential read/write speeds up to 2 GB/s.

Cons:

  • Lower IOPS. IN my testing, bonnie++ showed random seek numbers in the 400-600/sec range. Much lower than the SSD backed EBS volumes.
  • Ephemeral storage. Data is gone if the instance is stopped or destroyed. (It does persist through OS reboots however)

If you're going to consider the d2's, you have to do so with index replication. Leverage the location awareness of the feature with AWS Availability Zones. (An indexer in each AZ configured such that a copy of each bucket exists in all of them) With the increased capacity of the d2 instances, it's much easier on the pocket to use replication.

The nice thing about using the d2's with replication is it lets you be more flexible in scaling your environment in response to load (Which may or may not matter depending on your particular use case) You can add/remove indexers from the cluster to scale up/down as long as you give the cluster opportunity to recover when pulling instances.

The cost differences can depend on the capacity you need and how you plan to buy it. The 70% discount on 1 year pre-paid d2 instances makes a BIG difference when comparing cost, too. So do the math on your particular needs. I've found this tool very helpful in determining what's needed. Splunk Sizing

One thing I will recommend is, even if you don't use replication, set up your indexers in a cluster from the start. It will make your life worlds easier if you decide to change your replication settings in the future, and makes configuration management easier.

View solution in original post

emiller42
Motivator

We're actually evaluating the same thing. Here are the quick bullet points I've come up with so far:

C4 + EBS volumes:

Pros:

  • Persistent storage. (Can be detached and reattached to the same instance, or a new one)
  • High IOPS. EBS volumes are SSD backed. Even without using Provisioned IOPS, you get very good numbers here. (We see approx 3000 random seeks/sec from a 5TB EBS store)

Cons:

  • (Comparatively) Low throughput. At best, you're going to get 500MB/sec read/write if you're using a c4.8xl. Thankfully, they're EBS optimized by default, which is a requirement, IMO. (otherwise your incoming data and disk IO are competing for throughput) In our experience, this has been a bottleneck, and a primary reason I'm looking at the d2's myself.
  • Cost. EBS volumes are expensive, and they don't benefit from the kinds of savings you get from reserved instances.
  • Half the RAM of the D2's.

D2s:

Pros:

  • WAY more storage capacity for your dollar.
  • Double the RAM of the c4's
  • Much better throughput. Bonnie++ testing of a d2.4xl with the ephemeral in a raid 0 showed sequential read/write speeds up to 2 GB/s.

Cons:

  • Lower IOPS. IN my testing, bonnie++ showed random seek numbers in the 400-600/sec range. Much lower than the SSD backed EBS volumes.
  • Ephemeral storage. Data is gone if the instance is stopped or destroyed. (It does persist through OS reboots however)

If you're going to consider the d2's, you have to do so with index replication. Leverage the location awareness of the feature with AWS Availability Zones. (An indexer in each AZ configured such that a copy of each bucket exists in all of them) With the increased capacity of the d2 instances, it's much easier on the pocket to use replication.

The nice thing about using the d2's with replication is it lets you be more flexible in scaling your environment in response to load (Which may or may not matter depending on your particular use case) You can add/remove indexers from the cluster to scale up/down as long as you give the cluster opportunity to recover when pulling instances.

The cost differences can depend on the capacity you need and how you plan to buy it. The 70% discount on 1 year pre-paid d2 instances makes a BIG difference when comparing cost, too. So do the math on your particular needs. I've found this tool very helpful in determining what's needed. Splunk Sizing

One thing I will recommend is, even if you don't use replication, set up your indexers in a cluster from the start. It will make your life worlds easier if you decide to change your replication settings in the future, and makes configuration management easier.

hemendralodhi
Contributor

So Finally what you went with? I am also looking at both EBS and d2. I am mainly concern with Performance not price. Which is best in terms of performance when replication is there.
Thanks
Hemendra

0 Karma

emiller42
Motivator

We recently switched from EBS to d2's, and are very happy with the results so far. using d2.4xls with the ephemeral in a RAID0 gives us IOPS in the 7k range, and we can handle indexing throughput spikes of up to 7MB/s without appreciable queue saturation. Yes, EBS can get better IO, but the fact that you're throttled on throughput makes it hard to actually take advantage of it.

It's been about 4 months so far, and we haven't had an indexer fail yet. We have had one go temporarily unavailable for about 20 minutes, but there was no data loss. The cluster handled that without any real end-user impact.

All in all, I would have a hard time recommending EBS backed instances unless the install is small enough that running multiple d2s is overkill. You definitely want to have replication happening if you use them.

0 Karma

hemendralodhi
Contributor

Thanks Emiller for quick response. We will have ~600GB per day data ingestion and 90 day retention and I am planning to use 4 indexers ( RF=3,SR=3) - 2 instance in each AZ(AZ1/AZ2).
Either use c4.4x large with IOPS SSD(30 day data) and Throughput optimized HDD (st1) for rest 60 day data else 4 instance of d2.4x large instance.

What do you think about this configuration?

Thanks
Hemendra

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...