Hello,
I have the following scenario:
5 indexers with 500 GB of storage for splunk indexes;
1 indexer with 1 TB of storage for splunk indexes.
All indexers have the same set of indexes and forwarders are sending events in autoLB mode.
What happens if an indexer runs out of space? Does the forwarder send the event to another indexer?
Thanks in advance and kind regards.
Luca Caldiero.
Not enough disk space could cause Splunk to crash, depending on which version you are running. The first thing you will see is an indexer congestion warning. You will be notified that Splunk is dropping index=_internal events in an effort to preserve your data. If the indexer congestion is not remedied, the queues will become blocked, and the indexer will stop listening on all data input ports.
In other words, the indexer will begin to refuse TCP connections from all forwarders. The forwarders will log this, and will periodically re-attempt connections to that indexer. In the meantime, the forwarder will continue to send to other indexers that it is load-balancing across without losing any of your data.
Here a couple of messages you will see in the splunkd log if this happens:
01-28-2015 22:19:13.617 +0000 WARN AuditTrailManager - skipped indexing of internal audit event will keep dropping events until indexer congestion is remedied. Check disk space and other issues that may cause indexer to block
01-28-2015 22:19:32.767 +0000 INFO TcpInputProc - Stopping IPv4 port 8105
01-28-2015 22:19:32.767 +0000 WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
On the forwarder side, you will see something like:
01-28-2015 22:19:36.429 +0000 WARN TcpOutputFd - Connect to 10.11.22.33:8105 failed. Connection refused
Then, it will connect to the next indexer and continue forwarding data if that connection is successful.
Not enough disk space could cause Splunk to crash, depending on which version you are running. The first thing you will see is an indexer congestion warning. You will be notified that Splunk is dropping index=_internal events in an effort to preserve your data. If the indexer congestion is not remedied, the queues will become blocked, and the indexer will stop listening on all data input ports.
In other words, the indexer will begin to refuse TCP connections from all forwarders. The forwarders will log this, and will periodically re-attempt connections to that indexer. In the meantime, the forwarder will continue to send to other indexers that it is load-balancing across without losing any of your data.
Here a couple of messages you will see in the splunkd log if this happens:
01-28-2015 22:19:13.617 +0000 WARN AuditTrailManager - skipped indexing of internal audit event will keep dropping events until indexer congestion is remedied. Check disk space and other issues that may cause indexer to block
01-28-2015 22:19:32.767 +0000 INFO TcpInputProc - Stopping IPv4 port 8105
01-28-2015 22:19:32.767 +0000 WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
On the forwarder side, you will see something like:
01-28-2015 22:19:36.429 +0000 WARN TcpOutputFd - Connect to 10.11.22.33:8105 failed. Connection refused
Then, it will connect to the next indexer and continue forwarding data if that connection is successful.
Is there no mechanism for preferencing buckets going to an indexer with more space? I currently have 3 indexers with 1.3TB, one with 900GB, and am about to add one with 500GB. Will that just force me to stop at about 500GB per indexer? That would suggest that the Indexer cluster needs to be all roughly similar systems. At least in storage size.
Am very close to being able to add the 500GB (raid 10) to the cluster of 1.3TB (raid 5) and noticed the smaller system was warning about reaching an alert level of 80% in use on the index partition. Which made me wonder.
The forwarder is not aware that the indexer is out of space. I am pretty sure that the forwarder will continue to send data to the indexer that is full. However, if you go to the "full" indexer and block the receiving port, then no forwarders will be able to connect. Then they will send to other indexers instead.
I suggest that you set up some mechanism for monitoring and alerting before the disk becomes full.