Splunk forwarder not forwarding all data
Problem Summary:
A customer was running 2 indexers. One failed and all logs were not being forwarded to the active indexer. Customer checked logs submitted for the indexer and a number of forwarders and the issue appeared to the customer to be occurring from only one forwarder. The other forwarders were reportedly working fine.
Working hypotheses:
One working hypothesis to this issue is that there may be a network issue (possibly a firewall) that is preventing data from getting from the forwarder to the indexer.
A secondary hypothesis is that there is a load-balancing issue that may be addressed with 'forceTimebasedAutoLB' setting in forwarder's outputs.conf.
How to troubleshoot:
(1.) If the deployment is either Linux or Windows, telnet from the host to the receiver to confirm whether there is an established connection, as seen below:
telnet xx.xx.xx.xx 9996 (dest. IP & dest. port) (if "connection failed" error is thrown, there is typically a firewall impeding traffic)
If the deployment is Linux, run 'iptables -L -n' to find detailed information of source and destination IP addresses along with any firewall ports. 'This will return the current set of rules. There can be a few rules set here even if your firewall rules haven't been applied. Just look for lines that match your given rulesets. This will give you an idea of what rules have been entered to the system.'
If the deployment is Solaris, run 'netstat -un -P tcp' or 'netstat -un -P udp'
(2.) If no firewalls are found, enable 'forceTimebasedAutoLB' setting in HF/UF outputs.conf and confirm that traffic is being properly load balanced between receivers.
Resolution:
We confirmed that there were no firewall rules preventing data from getting to active indexer after primary indexer outage.
I engaged in a WebEx call with customer in which 'forceTimebasedAutoLB' setting was discussed in an effort to cause the forwarder to send data to an active indexer in any event that an indexer experiences an outage.
We tested the configuration of the 'forceTimebasedAutoLB' parameter in outputs.conf with the stoppage of a primary indexer and found expected logs that once were not being received.
The customer formerly reported resolution to the issue.
... View more