I upgraded to 4.3.3 on an indexer that never had any problems before this point in time and now the indexer is dropping all forwarded events on the floor with messages like this
07-11-2012 12:44:17.568 -0500 INFO TcpInputProc - Stopping IPv4 port 9997
07-11-2012 12:44:17.568 -0500 WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
I've seen similar questions appear like this on splunkanswers, but the suggested resolutions (involving fishbucket) dont seem to apply to my case?
I turned on splunk debugging, but it doesn't lead me to any better conclusions.
What queues is it referring to? The box is ripe with CPU, disk, and RAM. It cant possibly be overloaded; it's not doing anything.
Support is being a lame duck; taking their time staring at walls. In the meantime my primary splunk indexer is not indexing anything because it's not receiving anything from the forwarders.
Does anyone have any clues as to where I could look? If it's not resolved by tomorrow I'm re-installing splunk on the primary indexer as this is not something that can wait.
Thanks in advance for any help and guidance you can provide.
The queues that are mentioned by that message are those that lead into the data pipelines where splunkd shapes your data into events before indexing those on disk.
This message would indicate that there is a bottleneck in one of those pipelines, which causes the queue that feeds it and all queues upstream to fill up, all the way to the queue that accepts incoming events from forwarders (splunktcpin).
This is obviously undesirable, but keep in mind that your forwarder events are not being dropped. Instead, the forwarders will pause their data inputs and resume once the indexer is able to process data again.
When seeing such a message, the first thing that you should do is to determine the fill percentage of the queues leading to the 4 main data pipelines : parsing -> merging -> typing -> indexing.
By determining which is the most downstream queue to be saturated, you can get an idea of why there is a bottleneck there.
A simple way to gain visibility of the state of event-processing queues is to use the "indexing performance" view of the Splunk on Splunk app. For details on how to install the app, check this Splunk Answer.
If you can post a screenshot showing the panels of that view, I can try to help you further.
Incidentally, what is the case number that you opened with Splunk support? I can check in on it for you.
Tat was the issue for me.. Looping back to itself
What was the issue? Please help.. We are facing same issue.. If this is resolved, can you please give snippet of inputs. Conf and output. Conf files..
I try to avoid staring at walls whenever I can 😉
It's a shot in the dark without more information, but I had this issue before. Are you using the deployment server in your environment? Is it possible your forwarders' outputs.conf got deployed to your indexer?
On the indexer:
./splunk cmd btool outputs list --debug
See if you're somehow looping your inputs back to itself.
That would be consistent with the high-level symptom described.
The queues that are mentioned by that message are those that lead into the data pipelines where splunkd shapes your data into events before indexing those on disk.
This message would indicate that there is a bottleneck in one of those pipelines, which causes the queue that feeds it and all queues upstream to fill up, all the way to the queue that accepts incoming events from forwarders (splunktcpin).
This is obviously undesirable, but keep in mind that your forwarder events are not being dropped. Instead, the forwarders will pause their data inputs and resume once the indexer is able to process data again.
When seeing such a message, the first thing that you should do is to determine the fill percentage of the queues leading to the 4 main data pipelines : parsing -> merging -> typing -> indexing.
By determining which is the most downstream queue to be saturated, you can get an idea of why there is a bottleneck there.
A simple way to gain visibility of the state of event-processing queues is to use the "indexing performance" view of the Splunk on Splunk app. For details on how to install the app, check this Splunk Answer.
If you can post a screenshot showing the panels of that view, I can try to help you further.
Incidentally, what is the case number that you opened with Splunk support? I can check in on it for you.
Sideview Utils. Next time I'll read before asking
Appears I have it installed, but when going to use it, I get this error.
Splunk encountered the following unknown module: "sosFTR" . The view may not load properly