In the last day or two all the queues of one indexer got filled up. We bounced it and now on another indexer all the queues are close to 100%. What can it be?
Normally, for months and months at this point of the day all the queues would be quite empty. However, h2709
is still pretty bad -
![
I took h2709
out of the forwarders rotation (for the most part) and it took around 25 minutes to clear all queues.
After 25 minutes, h2709
queues are fine...
The binding with h8788
remained throughout the night and this server already processed 1/2 TB of data.
Thank you @mwirth for working with us !!! So, one forwarder sends to us huge amounts of Hadoop/Flume data and just yesterday we received 1 TB of data from this forwarder.
We end up with a forwarder-indexer bound. How can we avoid it?
Usually there's 3 things that block up queues;
In this case, it's pretty clear the indexer in question is getting 2x the instantaneous indexing rate of the other indexers. My question is; is this server usually that much higher than the others?
Okay, so that means that the forwarder(s) in question are successfully sending to multiple indexers, that's great!
Now we need to find out what datasource is causing that indexing bandwidth. Go to the monitoring console and open the "Indexing Performance: Instance" dashboard. Scroll down to the "Estimated Indexing Rate Per Sourcetype" panel and see if there are any outliers.
EDIT:
That feast/famine cycle (Where an instance has an enormous indexing rate with full queues then drops to nearly none) is just the data load balancing to another server and the queues emptying to disk after backing up. Very normal in this circumstance.
Dark purple and dark green look like likely suspects, take note of those sourcetypes. Can you confirm the same spike in indexing load from those sourcetypes on other hosts during the time window when they had issues?
This is normally going to happen because of a single forwarder sending a very high amount of bandwidth. It can be addressed in a couple of different ways-
1. Increase the number of threads on the forwarder, since each thread can send to a distinct indexer.
2. If the data is coming from a centralized data source (like syslog etc) spread the load out between hosts.
For some perspective, 6MB/s over a 24h period would result in over 500GB/day, which is well outside the recommended 200-250GB/day per indexer. No wonder the poor servers are struggling!