Hi All,
We have a customer who could not justify the cost of a clustered solution. So they went down the following route.
Basic System
2x Indexers with Splunk frontends
3x Universal Forwarders
Data from Forwarders
It was envisioned that if Indexer-1 dies Indexer-2 will still be chugging along with a similar data set that is polled less frequently.
This all currently works perfectly.
However if you take one of the indexers offline the universal forwarders queues fill up as they cannot send data to the offline indexer. The whole indexer grinds to a halt and no new data is sent to the indexer that is still online.
While I understand the system is protecting against data loss. The whole system grinding to a halt is actually much worse.
I thought blockOnCloning in outputs.conf might resolve this as the Unix TA logs are cloned but based on the default behaviour of this is not the issue causing the queue to fill up either.
dropEventsOnQueueFull does not appear behave how I would expect it to behave. Docuemntation seems to indicate it doesn't drop the queue contents it cannot deliver (due to indexer outage) it just keeps the queue full and drops any new data. So instead of getting rid of the data that is causing the blockage and continuing it just drops everything new??? Seems a bit backwards to me.
Is there any way to resolve this?
I dont care if data is lost for the offline indexer I just want my remaining online indexer to keep getting data.
dropEventsOnQueueFull (in outputs.conf) seems to have resolved it even though the manual seems to indicate it does the exact opposite and drops NEW events.
http://docs.splunk.com/Documentation/Splunk/latest/admin/outputsconf
May I recommend to a Splunk staff member to reword the manual entry for this to be less amiguous.
change to
The way it is currently worded seems to indicate that once the queue is full any new events arriving at the indexer will be dropped. It makes no mention of removing/dropping data from the queue itself.
Is there a better solution here?
I have also tried setting queues in inputs.conf which has no effect.