How to accomplish?
Right now, I have two indexers with distributed search, but they each have separate indexes, so if a node goes down, I am missing half my data.
I have a fast network share, so putting the indexes there isn't a problem, but I can't have both indexers write to the same index..
Could I have them each write to a separate index, but search across both? Wouldn't I get the same results twice?
Should one be an indexer, but the other a "fall back" indexer?
How to accomplish total redundancy? I can run web on both and put a load balancer in front no problem.
NOTE: The "forward to the other one for indexing as well" seems to imply I need to double my license. Not an option.
<pedantic>
"Totally redundant" is not attainable with today's commodity hardware. </pedantic>
There are a couple of ways of achieving higher availability however.
The typical configuration in the Splunk reference architecture is to have forwarders send a copy of the forwarded data stream to two different indexer farms. For argument's sake, say call them indexer farm A and B. Indexers A1, A2, A3 ... An have one copy of the forwarded data. Indexers B1, B2, B3 ... Bn have another copy. As you mentioned, this doubles your splunk license requirement. I've heard of Splunk marketing an "HA" license that allows for this without paying double.
Another option (but one that isn't in the Splunk reference architecture) is a quasi-standard shared-storage cluster. Say you have an iSCSI array that can do sufficient IOPS for 4 indexers. Then you configure 5 servers to all have iSCSI access to the same spindles, and set up a cluster manager to run 4 Splunk instances across those 5 servers. The cluster manager would be responsible for making sure that there's only one copy of the 4 instances running, arbitrating access to the iSCSI volumes, moving around IP aliases that correspond to each Splunk instance, etc. It's really not different from how you'd use MSCS for SQL Server, or how you'd use Red Hat's cluster product in a similar scenario.
The shared-storage cluster has no obvious impact on your licensing, but it adds a good bit of complexity. Shared storage clusters can be notoriously difficult to get right and keep maintained in proper working order over the long haul. And, because it's shared storage, the storage itself can be a point of failure. It also eschews Splunk's reference architecture of using many mid-sized commodity nodes with fast local disk. Your shared storage would have to be robust enough to support the I/O requirements of 4 indexers.
At the end of the day, it's an economic exercise comparing the cost of additional Splunk license (for the duplicated data option) versus the cost of shared storage and the additional setup and long-term care and feeding of a cluster. And the right answer depends on lots of things specific to your own environment.
well, there is a wrinkle here. Our new data center is all VMs. No local disk at all. Its surpisingly fast. I think rather than have two vms, I will just get one larger one. In a DR situation, we can bring up the VM in a new data center, and that can be scripted/automated.
so there's no way to make this automatic?
ug... i wonder how I notice the start of queuing? I guess I have to set up an alert...
You can have two indexers pointing at the same file system for their indexes and only turn on one at a time. All data should come from forwarders that load balance between the two with indexer acknowledgement turned on. This would give you a warm standby.
When everything is working fine the forwarder send data to indexer A and can not connect to indexer B.
When Indexer A fails. The forwarder will not be able to reach either and will start queuing. You notice this and start indexer B. The forwarder detects this and starts sending to indexer B which saves to the same file system.
The file system would still be a single point of failure though so you need to make your own arrangements to back this up.