I'm going to be deploying Splunk on the following hardware:
2 of Dell PowerEdge R715
- 16GB RAM
- PERC H700 RAID controller with 512MB battery-backed cache
- 6xHUC101212CSS600 HDDs (SAS 2.5" 6Gb/s, 1.2TB, 10K RPM, 4.6ms seek)
giving a total theoretical array performance of approx 395IOPS or 790IOPS peak (bonnie++ benchmarking gives somewhere between 650 and 950 random seeks/sec, the higher figure when the test set size is twice the RAM, i.e. 32GB)
- 2xOpteron 6328 CPU (3.2GHz, 8 cores/CPU, 16MB L3 cache/CPU)
We currently have a 10GB/day Splunk license.
This is our first real deployment of Splunk, so we're not really sure what we're going to find it useful for, which apps we are likely to use, or how many users it'll have. We're using this as a learning exercise to determine whether Splunk is of value to us, and if so, to work out where we need to concentrate attention (and money!) in any future deployment.
It seems as though the best use of this hardware is to use one R715 as a search head and the other as a dedicated indexer. Given that search heads seem to be mostly CPU and memory-bound according to http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Referencehardware I'm debating how to make the best use of the hardware on which the search head will be running:
1) Export the 3.1TB of RAID 10 disc via NFS, mount it on the indexer and use it for storage of cold Splunk buckets (we also plan to roll old buckets onto iSCSI SAN storage that will be mounted by the indexer)
2) Run an indexer on the search head as well (how would this affect performance?)
3) Use the storage for something unrelated that uses relatively little CPU (e.g. non-Splunk'ed rsyslog)
4) Other...
What's the received wisdom from the crowd here?
EDIT: Thanks to Martin Mueller for his quick answer. I should have mentioned that we already have a toy deployment where some colleagues have been trying some things out. I think that's already at the 10GB/day limit, and we have lots more data sources we'd potentially like to Splunk. More than we can afford, probably... So the plan is to buy license volume as we need it, but stick with this hardware for the next few years.
With 10G daily volume you'd be good with running one of those boxes as an all-in-one Splunk, assuming you don't do weird things like a billion users churning through the data all day long. One box with the six drives in RAID10 should give you a retention time of well over a year, more than enough to get started.
Sure you could use one as a SH and one as an IDX, but the gain is small for normal situations and somewhat offset by the network between the two. Here's a potentially better alternative: Set up two all-in-one instances with one of them acting as license master for the other.
That way you could use one as your testing/sandbox instance and move stuff that's somewhat "done" to the more production-ish instance to keep running without being messed with too much.
Additionally, starting with one all-in-one instance keeps things simple and your initial get-go quick.
When you either add license volume or run out of space in a year or three you can still consider upgrading your hardware.