I thought I'd come back here and mention that I've since built our splunk cluster using splunk clustering. What I missed in the splunk clustering documentation is that you can use a replication and search factor to cut down on the amount of nodes that have data copies and also greatly compress some of them, in addition to splunks already good compression.
Also, if you set up active passive when you fail over to the new node it has to run a series of checks against your data before starting up, I was told by a consultant, so it slows down the failover greatly.
... View more
Hmm I must admit I'd not looked at the reference hardware as I know what we need to use. Namely spare blades from our last refresh and space on one of our SANs. The 1.8TB now is not such a huge deal but when you figure part of our backup strategy is offsite replication (via the SANs), so there's another copy, then to simply run 2 servers is another copy again, then you up our limit to 10GB/d, 20GB/d etc. When you say Splunk can be run perfectly well in a shared storage environment what are you referring to may I ask, as in active/passive or.. can you elabourate please?
... View more
We're putting in splunk in the next few months as a part of PCI compliance. I'm just getting the ball rolling and starting my learning cycle so I'm pretty new to it all.
The first step, which I'm doing, is to architect our splunk deployment. Looking around I'm somewhat baffled to find that there seems to be no way to use shared storage and HA between devices to failover if a node goes down. So what I'm envisioning is iSCSI disk mounts on 2 physical nodes (indexers), one node is active and the other is a standby. If the active goes down the standby takes over. Is this possible with splunk?
Assuming no, as I've not read anything about it (why not?! This is basic stuff) then it seems like my only other options are a HA license to do clones streams to 2 indexers or to use cluster replication. Both of these options use literally twice as much storage space from my interpretation, which seems to fly right in the face of everything learned about de-dupe. I understand this is a performance boon as well, but even on our measly 5GB a day for our first year while we put it in, with a 1 year requirement of data that's an extra 1.8TB just to have HA, if we moved to 10 or 20 then it seems like such a waste of storage. I'm just trying to understand it here.
I've thought about some other options and was wondering if anyone had tried these:
Build a 2 host VMWare cluster and put the indexer on that. If we want to add another indexer then we add another host to the cluster (so N + 1 basically). That way the host has dedicated resources by is redundant to hardware failure.
Use heartbeat or some other open source HA software to manually monitor the process and fail it over. It just seems strange to use old school open source stuff to make HA a product like splunk which is so developed.
Use our hardware load balancers (F5s) to essentially make 1 server active only and only send traffic to the other if the first goes down. But what happens here if I'm running 2 instances of Splunk pointing to the same indices without proper shared storage clustering software, even if only one is reading/writing at a time. Would that cause issues?
Appreciate any help, thanks.
... View more