I've heard that splunk 5 clusters do not work well when the nodes are separated over distances (like a WAN). Why is that? Would it make a difference to the cluster if we had a high speed connection (line 100mbps) between two data centers? What are other limitations to clustering across multiple data centers?
From the System requirements and other deployment considerations topic in the Managing Indexers and Clusters manual:
"With sufficiently high-quality connections, it is possible to deploy the cluster across data centers. However, in the current version of Splunk, the cluster is not site-aware. For example, in a scenario where you have peer nodes spread across two data centers, you cannot specify that one replicated copy of the cluster data reside on nodes in one data center and a second copy reside on nodes in a second data center. When the master determines how some set of data gets replicated across the cluster, it does not take peer location into consideration."
Have you gotten to test it yet? If so, what was your experience?
From the System requirements and other deployment considerations topic in the Managing Indexers and Clusters manual:
"With sufficiently high-quality connections, it is possible to deploy the cluster across data centers. However, in the current version of Splunk, the cluster is not site-aware. For example, in a scenario where you have peer nodes spread across two data centers, you cannot specify that one replicated copy of the cluster data reside on nodes in one data center and a second copy reside on nodes in a second data center. When the master determines how some set of data gets replicated across the cluster, it does not take peer location into consideration."