The bin/splunk show shcluster-status is also failing from time to time, either showing all members as down, or showing:
"Failed to proxy call to member https://xxxxxxx:8089.
Encountered some errors while trying to obtain shcluster status."
We have a 11 node search head cluster, 28 threads and 128 GB of RAM each, running on Oracle linux 7.6 version, all patched, all physical servers.
We have a high limit for splunk user:
splunk@xxxxxxxx:~$ cat /proc/194162/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 200000 200000 processes
Max open files 200000 200000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 514519 514519 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us