We currently have one Splunk server which is used by several dozen of my coworkers for ad-hoc queries and some scheduled reports. I need to scale our Splunk deployment to improve performance but I only have a 3-4 machines to work with. I am trying to decide between a Splunk cluster verses a distributed deployment. My primary goal is improved search performance. Redundancy of the indexes would be nice but is not a requirement.
What is the smallest number of servers required for a Splunk Cluster? If I understand the Cluster manual correctly, I need at least three hosts, or four hosts (Including two peers) to ensure data redundancy.
In a cluster, can a search head also be a peer node? That would reduce the number of servers by one. Would this incur a performance penalty for searches? Any reference to the Splunk manual would be helpful.
Hi Stefan:
A cluster is a distributed deployment, just with the index replication enabled and a Master Node to manage it.
If you are only interested in performance just do a distributed deployment.
Yes, you can run your search head on an indexer. Many of our customers when they are scaling from a single Splunk server to more servers their first step is to add a second server as an indexer, so their total configuration would be one server acting as a search head and indexer, and a second server acting as only an indexer.
The performance penalty for running a search head on an indexer depends on how much searching you're doing, the types of searches, and over how much data. Basically you'll require one CPU core on every server that your search is running on for the duration of the search. If your search head is running on an indexer I believe that you'll be using two cores for each active search. This means that if you're running four total servers, and one of the servers is hosting the search head as well as an indexer, that server is likely to be resource constrained in both CPU and Memory.
There are some descriptions and example calculations in this Splunk Doc page: Accommodate concurrent users and searches
Since it's not easy to calculate your search and indexing load manually, I suggest you download and use the Splunk on Splunk App on your existing Splunk server. It will show you all kinds of useful search and indexing performance statistics, and as you add more Splunk servers to your environment you can see how your performance changes, and decide as you scale up whether you're happy with the performance of a search head on one of your indexers, or running it separately.
If it were me, and I had access to four servers, I'd probably deploy two servers as indexers and one server as a search head and indexer. Then I'd monitor it for awhile and see how the performance was and the resource contention on the server running the search head. If performance was poor and resource contention was high on the search head box, I'd deploy the forth server as a search head and switch the combined server to an indexer only. If I was having no problems with the combined server, I'd add the fourth server as an indexer.
Best of luck!
Jon
search peers, search heads, and cluster coordinator/master in a cluster must be separate instances of Splunk. However, they could reside on the same physical node or same OS instance. The load actually on a master is not high, and it might be okay to co-locate the two on the same instance. However, since you're looking at clustering, you should think about what happens if that node goes down, since a lost node will mean losing more than one Splunk instance.
Hi Stefan:
A cluster is a distributed deployment, just with the index replication enabled and a Master Node to manage it.
If you are only interested in performance just do a distributed deployment.
Yes, you can run your search head on an indexer. Many of our customers when they are scaling from a single Splunk server to more servers their first step is to add a second server as an indexer, so their total configuration would be one server acting as a search head and indexer, and a second server acting as only an indexer.
The performance penalty for running a search head on an indexer depends on how much searching you're doing, the types of searches, and over how much data. Basically you'll require one CPU core on every server that your search is running on for the duration of the search. If your search head is running on an indexer I believe that you'll be using two cores for each active search. This means that if you're running four total servers, and one of the servers is hosting the search head as well as an indexer, that server is likely to be resource constrained in both CPU and Memory.
There are some descriptions and example calculations in this Splunk Doc page: Accommodate concurrent users and searches
Since it's not easy to calculate your search and indexing load manually, I suggest you download and use the Splunk on Splunk App on your existing Splunk server. It will show you all kinds of useful search and indexing performance statistics, and as you add more Splunk servers to your environment you can see how your performance changes, and decide as you scale up whether you're happy with the performance of a search head on one of your indexers, or running it separately.
If it were me, and I had access to four servers, I'd probably deploy two servers as indexers and one server as a search head and indexer. Then I'd monitor it for awhile and see how the performance was and the resource contention on the server running the search head. If performance was poor and resource contention was high on the search head box, I'd deploy the forth server as a search head and switch the combined server to an indexer only. If I was having no problems with the combined server, I'd add the fourth server as an indexer.
Best of luck!
Jon
Splunk on Splunk app is wrong in the CPU consumption dashboards in that it makes the assumption that "search duration=CPU consumption". Which is totally wrong.
Thank you for clarifying the relationship between a regular "distributed deployment" and the new "Clusters" which were introduced in Splunk 5. I've been reading the Splunk documentation, and I can tell there is some overlap between the two architectures, but I am having a really hard time determining the similarities and differences.