First, some quick background about this tip.
All these machines run Windows, so from a UF node we used Powershell to test the port on the HF:
$(new-object net.sockets.tcpclient).connect("10.xx.xx.xx",9997)
If that command is successful it will immediately return a good old C: prompt, but will throw an error afer a few seconds if it is unsuccessful. In our case it was unsuccessful. Grrr.
netstat -an
showed that 9997 was listening on HF. Grrr.
Firewall guys said everything was cruising through unfettered. Grrr.
After growling for a bit and questioning the sanity of the firewall guys I looked at the indexer. Yup, it was running. Looked again and found this:
There was 9997 listening on the indexer...
PS C:\Windows\system32> netstat -an | findstr "9997"
TCP 0.0.0.0:9997 0.0.0.0:0 LISTENING
TCP 10.54.54.70:9997 10.54.52.85:60353 ESTABLISHED
TCP 10.54.54.70:9997 10.54.54.32:52020 ESTABLISHED
TCP 10.54.54.70:9997 10.54.54.32:52315 CLOSE_WAIT
TCP 10.54.54.70:9997 10.54.54.33:51987 ESTABLISHED
TCP 10.54.54.70:9997 10.54.54.33:52202 CLOSE_WAIT
TCP 10.54.54.70:9997 10.54.54.33:52203 CLOSE_WAIT
TCP 10.54.54.70:9997 10.54.54.34:63000 ESTABLISHED
But wait a minute....it isn't.....
PS C:\Windows\system32> netstat -an | findstr "LISTEN"
TCP 0.0.0.0:135 0.0.0.0:0 LISTENING
TCP 0.0.0.0:445 0.0.0.0:0 LISTENING
TCP 0.0.0.0:3389 0.0.0.0:0 LISTENING
TCP 0.0.0.0:5985 0.0.0.0:0 LISTENING
TCP 0.0.0.0:8089 0.0.0.0:0 LISTENING
TCP 0.0.0.0:8191 0.0.0.0:0 LISTENING
TCP 0.0.0.0:9887 0.0.0.0:0 LISTENING
TCP 0.0.0.0:10000 0.0.0.0:0 LISTENING
TCP 0.0.0.0:47001 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49152 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49153 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49154 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49155 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49183 0.0.0.0:0 LISTENING
TCP 0.0.0.0:49198 0.0.0.0:0 LISTENING
.
Well.
So, the heavy forwarder accepted my incoming Powershell connection and routed that connection right over to the indexer where it failed. I bounced the indexer and like magic it was fixed.
I like to share the strange, silly and stupid things I notice, so maybe this will help someone somewhere keep from staring at their screen in confusion for 30 minutes like I did today.
We had recently the same behavior on Linux for the indexers. The situation was so severe that we ended up creating a monitoring page for port 9997 for all the indexers. If the port is blocked and doesn't get open in a timely manner, our procedure is to bounce the indexer.
The root cause, in our case, was the fact that the indexing queues were filled up and by making them bigger, the situation is much better.
We need probably to open an enhancement request for this type of behavior.
Let me guess: The Indexers are Windows OS, right?
And you skipped your monthly reboot
!
LOL it was rebooted exactly 14 days earlier.
Oh, yeah. Wheeeeeee.
I think I am slowly making headway on a campaign to go Linux. Wish me luck!
Just sharing. 🙂