I am working with a partner in Brazil on a 1.6TB/day deployment.
The partner have some questions about the best approach to follow that I think you might help with some ideas.
Here is their scenario:
The bank is allocating 8 servers for the indexers. Each indexer will have 12 discs of 600GB in a total of 7.2TB for each.
Their idea is to use Multi Site Cluster architecture with a replication factor of 2, allocating 2 indexers (servers) for each different site.
The would have 4 sites (all in the same physical data center). They are referring as site for the pair of Indexers that replicate data between them.
According to partner current scenario could give them about 30 days of retention for the available storage on these servers. The customer’s ideal retention would about 60 days, and they are looking in a ways of optimizing storage and improving the current retention rate.
The partner is looking for some recommendations or best practice to deploy this architecture by optimizing the storage available and increasing the retention potential at the same time.
They are wondering whether the Multi Site Cluster approach will be the best suited here.
Any insights you might have are very appreciated!
Why are they doing multi-site if they have only one location? They have less availability if they do that. Why not put all 8 indexers in a single site/cluster? If they want maximum storage with some redundancy, they should simply use a SF of 1, and an RF of whatever they are comfortable with. If it's 2, that's one extra copy.
Then this seems like a misuse of clustering. It's not clear to me what your goals and priorities are. The point of clustering is to ensure availablity of search and data, and it uses space (i.e., retention) to do so. If all you want is retention and protection of the data, and don't care about it being available in case of trouble, then you should probably not use clustering. Instead, you can use backups or rsync to make copies of the data in other places. Or, if you're simply concerned about scheduled maintenance, martin_mueller is correct that you should use maintenance mode.
Creating a new copy when a peer holding that bucket goes down is a good thing. Else you're open to a single point of failure, namely the other peer holding the existing copy of that bucket.
If you're worried about unnecessary copying and "searchable-making" during scheduled maintenance of peers you can use the maintenance mode to avoid that. http://docs.splunk.com/Documentation/Splunk/6.1.2/Indexer/Usemaintenancemode
RF of 2 will grant one copy, but if the node that´s the originated node of a log goes down, the node that recieved the replication not only will put those buckets in a searchable state but it´ll also start to replicate those buckets to any of the other avaliable nodes because it needs to keep 2 copies of each log.
Its a total unmanaged scenario were, during that server downtime, there´ll be 3 copies of the same log.
What we, Wellington and I, are seeking is to control bucket replica only inside it´s site and also enable searching data on all sites from all search heads.
The use of site configuration is more a configuration asset in this case, not need by different physiscal locations.
What's their motivation for using a multi-site cluster despite there being only one physical site?
To do the retention time maths, some number clarifications:
You mentioned an RF of 2, does that imply an SF of 1?
Is that an RF of 2 for the entire cluster or per site?
12 disks with 600GB each giving you 7.2TB - are they using RAID0 only?