Getting Data In

How can I design my search head and indexer clustering architecture?

rubeniturrieta
Communicator

Hi to everyone

I have a design, with four Splunk instances (two search head, and two indexers). I want an "indexer cluster" (for replication and fault tolerance), and a "search head cluster" (for search efficiency). I'll send only syslog to indexers (no forwarder).

I need two searchable data copies and I have some questions:

1.- Do I need more Splunk instances?
2.- Do I need to send syslog to only one indexer, or the same syslog to two indexers?
3.- If I send data to only one indexer, with replication, will I have the same data in two indexers?
4.- If I send same data to two indexers, with replication, will I have data copies twice, in two indexers?
5.- If one indexer is down, will the other one be enough for service continuity?
6.- If I have a traffic balancer, only for sending syslog data, can I send data to any indexer, do I need any special consideration?

Any help, I'll be very grateful

Thanks you

0 Karma
1 Solution

Yasaswy
Contributor

1.Do I need more Splunk instances?
Ideally Yes. A search head cluster requires a minimum of 3 members. So with just two search heads you will not be able to have a search head cluster.
Also for indexers, your options depend on your need for resiliency, the cluster can tolerate a failure of (replication factor - 1) peer nodes. And there are benefits to having a separate cluster master for your indexers.
Search head cluster will need a deployer as well. You do have the option of having a server take on multiple roles... still if your preference is to have both a search head and an Indexer cluster you will need more servers for a clean deployment.

2.Do I need to send syslog to only one indexer, or the same syslog to two indexers?
To get the maximum performance benefit it's preferable to send your data to all indexers (distribute the data). So when the searches hit the indexers, each peer node can process it's set of results and render them back much faster.

3.If I send data to only one indexer, with replication, will I have the same data in two indexers?
Yes. Else when an indexer goes down, your data is lost. Clustering requires indexers maintaining replicated copies of the data (as defined by the replication factor)

4.If I send same data to two indexers, with replication, will I have data copies twice, in two indexers?
I am assuming you mean, you want to send the data to two indexers and not send the "same" data (clone) to two indexers. You can also have the option of using an intermediate forwarder which will load balance the data for you. But yes, depending on how you will be sending the data there is the possibility of having duplicates ... which will get replicated.

5.If one indexer is down, will the other one be enough for service continuity?
Yes. If you enable clustering, a replication factor of 2 will ensure availability even on failure of 1 peer node.

6.If I have a traffic balancer, only for sending syslog data, can I send data to any indexer, do I need any special consideration?
Sending data to any indexer is fine... though as explained it’s better to distribute the data.

Additionally there is excellent documentation on clustering. Check out Indexer Clustering and Search Head Clustering.

Good luck with the deployment.

View solution in original post

rubeniturrieta
Communicator

All answers are useful. Thanks you very much.

Regards

0 Karma

Steve_G_
Splunk Employee
Splunk Employee

Answers to these questions are, by and large, readily available in the documentation.

For information on indexer clustering system requirements, see:
http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Systemrequirements

For search head clustering system requirements, see:
http://docs.splunk.com/Documentation/Splunk/6.3.0/DistSearch/SHCsystemrequirements

For basic information on how replication works in an indexer cluster, see:
http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Basicclusterarchitecture

Lucas_K
Motivator

Do I need more Splunk instances? Yes 😉 You need a index cluster master and a search head cluster deployer. Search head clustering also requires a minimum of 3 search heads.

Do I need to send syslog to only one indexer, or the same syslog to two indexers? No just one. The cluster master will trigger bucket replication based on the configured replication settings.

If I send data to only one indexer, with replication, will I have the same data in two indexers? Yes, however the master will mark one as a primary bucket and only that one will be accessed as the source of events when a search is triggered. The other is a backup.

If I send same data to two indexers, with replication, will I have data copies twice, in two indexers? You'll have technically 4 copies. 2 sets of originals and 2 sets of replicated. Each index will not know that the other has had duplicate events sent to it. There is no deduplication of incoming events so be careful that you only forward a single copy!

If one indexer is down, will the other one be enough for service continuity? This is dependent on the replication settings you have chosen. You need to have your search factor to be greater than 1 for the ability to continue to search the entire data set when an index is down.

If I have a traffic balancer, only for sending syslog data, can I send data to any indexer, do I need any special consideration? Yes you can send it to any availabel indexer. Make sure your load balancer is checking the status of the listening service (port 9997).

Yasaswy
Contributor

1.Do I need more Splunk instances?
Ideally Yes. A search head cluster requires a minimum of 3 members. So with just two search heads you will not be able to have a search head cluster.
Also for indexers, your options depend on your need for resiliency, the cluster can tolerate a failure of (replication factor - 1) peer nodes. And there are benefits to having a separate cluster master for your indexers.
Search head cluster will need a deployer as well. You do have the option of having a server take on multiple roles... still if your preference is to have both a search head and an Indexer cluster you will need more servers for a clean deployment.

2.Do I need to send syslog to only one indexer, or the same syslog to two indexers?
To get the maximum performance benefit it's preferable to send your data to all indexers (distribute the data). So when the searches hit the indexers, each peer node can process it's set of results and render them back much faster.

3.If I send data to only one indexer, with replication, will I have the same data in two indexers?
Yes. Else when an indexer goes down, your data is lost. Clustering requires indexers maintaining replicated copies of the data (as defined by the replication factor)

4.If I send same data to two indexers, with replication, will I have data copies twice, in two indexers?
I am assuming you mean, you want to send the data to two indexers and not send the "same" data (clone) to two indexers. You can also have the option of using an intermediate forwarder which will load balance the data for you. But yes, depending on how you will be sending the data there is the possibility of having duplicates ... which will get replicated.

5.If one indexer is down, will the other one be enough for service continuity?
Yes. If you enable clustering, a replication factor of 2 will ensure availability even on failure of 1 peer node.

6.If I have a traffic balancer, only for sending syslog data, can I send data to any indexer, do I need any special consideration?
Sending data to any indexer is fine... though as explained it’s better to distribute the data.

Additionally there is excellent documentation on clustering. Check out Indexer Clustering and Search Head Clustering.

Good luck with the deployment.

masonmorales
Influencer

This is the best answer IMO. I'm going to chime-in with a few comments though:
1. Search factor will also affect service continuity (unless you are okay with waiting for bucket fix-up activities to complete). To avoid this, set your search factor to 2 and replication factor to 2, at minimum
2. You can also virtualize search heads to get the minimum 3 Splunk instances required for a search head cluster
3. Universal Forwarders are completely capable of load balancing across indexers, so you don't need an intermediate forwarder to do this
4. Data cloning will double-count against your license, whereas using indexer clustering will maintain replicated copies of your data without double-counting
5. You should always load-balance your data across multiple indexers

ppablo
Retired

Hi @rubeniturrieta

I deleted your earlier post since it was a duplicate of this one and just left this one up after making some edits, in case you were wondering what happened to the other post. Cheers!

Patrick

rubeniturrieta
Communicator

Hi ppablo!

Thanks you very much!

0 Karma
Get Updates on the Splunk Community!

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...

Enterprise Security Content Update (ESCU) | New Releases

In October, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...