Getting Data In

Universal Forwarder block stops all indexing completely

phoenixdigital
Builder

Hi All,

We have a customer who could not justify the cost of a clustered solution. So they went down the following route.

Basic System

2x Indexers with Splunk frontends
3x Universal Forwarders

Data from Forwarders

  1. One set of polling logs goes to Indexer-1
  2. A second set of logs goes to Indexer-2 (same data sent to Indexer-1 but less frequent polling)
  3. And the Unix TA logs go to both indexers

It was envisioned that if Indexer-1 dies Indexer-2 will still be chugging along with a similar data set that is polled less frequently.

This all currently works perfectly.

However if you take one of the indexers offline the universal forwarders queues fill up as they cannot send data to the offline indexer. The whole indexer grinds to a halt and no new data is sent to the indexer that is still online.

While I understand the system is protecting against data loss. The whole system grinding to a halt is actually much worse.

I thought blockOnCloning in outputs.conf might resolve this as the Unix TA logs are cloned but based on the default behaviour of this is not the issue causing the queue to fill up either.

dropEventsOnQueueFull does not appear behave how I would expect it to behave. Docuemntation seems to indicate it doesn't drop the queue contents it cannot deliver (due to indexer outage) it just keeps the queue full and drops any new data. So instead of getting rid of the data that is causing the blockage and continuing it just drops everything new??? Seems a bit backwards to me.

Is there any way to resolve this?

I dont care if data is lost for the offline indexer I just want my remaining online indexer to keep getting data.

0 Karma

phoenixdigital
Builder

dropEventsOnQueueFull (in outputs.conf) seems to have resolved it even though the manual seems to indicate it does the exact opposite and drops NEW events.
http://docs.splunk.com/Documentation/Splunk/latest/admin/outputsconf

May I recommend to a Splunk staff member to reword the manual entry for this to be less amiguous.

  • If set to a positive number, wait seconds before throwing out all new events until the output queue has space.

change to

  • If set to a positive number, wait seconds before throwing out all new events (already in the queue) until the output queue has space. New events arriving at the indexer will still be placed onto the queue.

The way it is currently worded seems to indicate that once the queue is full any new events arriving at the indexer will be dropped. It makes no mention of removing/dropping data from the queue itself.

Is there a better solution here?

I have also tried setting queues in inputs.conf which has no effect.

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...