We'd like to do a distributed search setup but it doesn't look like we'll be able to afford a second cluster of search peers for redundancy. If I understand things right (which very well may not be the case) this leaves me with two potential options (for simplicity sake we'll assume two search peers):
OPTION 1: Send only half the data (by forwarder config or through load-balancing) to each search peer (indexer).
OPTION 2: Send all data to both search peers (indexer).
ASSUMED CON: Search performance
decreased because each has to index
double the data.
ASSUMED CON: Double the disk space
needed for desired retention.
ASSUMED CON: Must load-balance
searches or add dedup to every
Knowing full well that only I can answer this - which option is worse? Hopefully someone can tell me I just don't understand the intensions of distributed search or that there is some other solution.
Option 2 is not really something I'd recommend with only two servers unless our requirements are very particular to what it does, you should not use distributed search. You can load balance the UI between the two indexers, since they will have the same data though. Other than that, you should note that the second option also requires double the license volume, since you're indexing once, forwarding, and indexing again.
Using dedup is not the right answer, since, first, it will massively slow down every search, and second, you won't be able to tell if there legitimately are two identical entries.
Option two is expensive and also ineffective when you only have two servers anyway. You must choose between one of two sub-options in this case:
Not knowing anything at all about what your business requirements are, I would generally first suggest option 1 in combination with RAID disks and regular backups of the data. Indexing onto highly-available networked storage is also an option, and would allow you to remount a volume in case of server failure in lieu of restoring from backup, though this doesn't help you if the controller corrupts or someone accidentally "rm"s the index.
Thanks gkanapathy. Good point about the license. I was just using two as an example, we are pushing for four. Since we value performance over availability (at least until it goes down, LOL) I'm assuming a single cluster of four search peers is better than two mirrored clusters of two search peers.
Under option one (splitting the data among the search heads) is this best managed by load-balancing or deliberately dividing up where forwarders are sending their data?