Goal:
Load balance across two indexers writing to the same location on a NetApp Filer (NFS)
Question: (I am new to Splunk, so I may be asking the wrong question to begin with)
How can I configure my splunk setup with an index on shared storage to handle dynamic load-balancing between two indexers?
My understanding:
Thank you very much!
Your understanding is mostly correct. (Although it doesn't really change anything, you can read from and execute searches against an index shard that you're not writing to, at least in theory. There is an index setting in indexes.conf isReadOnly
that supposedly makes an instance not write to an index, but I've never used it. You are correct that only one instance can write to an index shard though; I'm not sure if there is actually any lock on the files, or if the instance simply assumes that it's the sole owner though.)
You will need to set up basically four instances of Splunk, two on each node (one active and one failover) and two "shards" of each index:
You will have to adjust the network port numbers so that the sA-* instances don't conflict with the SB-* instances if both are running on the same node.
In case of a failure, you would ensure that the failed node and splunkd process were stopped, the start up the corresponding standby instance on the other node. You would also do whatever was needed to switch the IP/hostname of the instances to point to the standby node. This can be done manually, or via clustering software, or VIP on a network load balancer.
I will also warn that while indexing over NFS will work, it is harder to guarantee the IOPs you'd like to have for excellent search performance. If your NFS is up to it, it should work fine. However, since no shard will be used on more than one node at a time, it's possible to use SAN volumes rather than NFS for each index shard.
I will also add that most of what you get from this setup will be rendered unnecessary by index replication within the Splunk product in an upcoming release. It is quite different from what I've described here, but provides similar functionality.
Your understanding is mostly correct. (Although it doesn't really change anything, you can read from and execute searches against an index shard that you're not writing to, at least in theory. There is an index setting in indexes.conf isReadOnly
that supposedly makes an instance not write to an index, but I've never used it. You are correct that only one instance can write to an index shard though; I'm not sure if there is actually any lock on the files, or if the instance simply assumes that it's the sole owner though.)
You will need to set up basically four instances of Splunk, two on each node (one active and one failover) and two "shards" of each index:
You will have to adjust the network port numbers so that the sA-* instances don't conflict with the SB-* instances if both are running on the same node.
In case of a failure, you would ensure that the failed node and splunkd process were stopped, the start up the corresponding standby instance on the other node. You would also do whatever was needed to switch the IP/hostname of the instances to point to the standby node. This can be done manually, or via clustering software, or VIP on a network load balancer.
I will also warn that while indexing over NFS will work, it is harder to guarantee the IOPs you'd like to have for excellent search performance. If your NFS is up to it, it should work fine. However, since no shard will be used on more than one node at a time, it's possible to use SAN volumes rather than NFS for each index shard.
I will also add that most of what you get from this setup will be rendered unnecessary by index replication within the Splunk product in an upcoming release. It is quite different from what I've described here, but provides similar functionality.
In which case the load-balancing at the forwarders would only need to worry about the indexers, the indexers would write their own index version of the index AND the search head(s) would basically treat it as if the index were being written locally on each of the separate indexers.
That's a very simple and elegant solution. Thanks.
Well, I would say sA-1 knows only iA (shard1/index1), and sA-2 is standby on the other node, but knows the same iA (shard1/index1) on the other node. sB-1 is on the same node as sA-2, and knows iB (shard2/index1)
Am i understanding you correctly; the indexes are named the same, but have different paths on the nfs. In this case, we would have to make sure the indexers only know about one of the indexes.
and
Edit: (since we are not currently concerned about resiliency)
the index name is the same on both sides. The shard does not have distinct name, it's just a different path that is set within the indexer config. If you mean "indexer", rather than "index", though, you just list both instances (or rather the virtual names/ips of each primary instance) and let forwarder load-balancing deal with it.
Thanks for the response.
Ignoring resiliency, and relying on the load-balancing feature on the forwarders, how would I specify the index name in the inputs.conf since we won't know which indexer it's feeding? Unless the intention is to hardcode the load-balancing...