Security

SplunkWeb becomes unresponsive

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

we have tens of analysts looking at splunk and we have several tv screens cycling automatically around some dashboards. We're not indexing a great deal of data, however once every few days splunkweb process does not return any content. Process is still up and running however nothing is coming back.

Python process is showing on Top output pretty busy and a netstat shows many connections in CLOSE_WAIT... what is happening?


tcp 427 0 192.168.42.23:8000 10.75.23.2.12:52592 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49916 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58018 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49934 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58014 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.12:52583 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.11:45887 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58044 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.11:45885 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.12:52586 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.12:52585 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.6:49827 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49933 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49919 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.6:49841 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.6:49849 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.6:49844 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49924 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.6:49842 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.75.23.2.12:52589 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58043 CLOSE_WAIT
tcp 429 0 192.168.42.23:8000 10.72.10.4.115:49920 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58064 CLOSE_WAIT
tcp 427 0 192.168.42.23:8000 10.72.10.4.100:58063 CLOSE_WAIT

CLOSE_WAIT don't sit there too long, however it looks like they build up in time.
Running a pstack shows number of thread fluctuating above 20 with peaks of 70-80.

1 Solution

abonuccelli_spl
Splunk Employee
Splunk Employee

If for any reasons connections are not closed in time or there are just too many, the webserver in splunk (cherrypie) will just spend most of the time creating/destroying threads.

If your server has enough CPU,memory,network resources, then you might be looking at increase default values for the web threadpool, these are stored in $SPLUNK_HOME/system/default/web.conf


server.thread_pool = 20
server.thread_pool_max = -1
server.thread_pool_min_spare = 5
server.thread_pool_max_spare = 10

You should never edit these in a default file, rather add the properties you want to change in $SPLUNK_HOME/etc/system/local/web.conf, for example:


server.thread_pool = 50
server.thread_pool_min_spare = 10
server.thread_pool_max_spare = 20

These changes should be done only if symptoms very closely match the description above and if enough hardware resources are available as cherrypie is expensive especially in terms of CPU usage.
Ultimately if a large number of clients are not closing connections properly (or firewall in the middle is causing issues with terminating connections gracefully) threads in the web server pool will be starved in any case at some point, and increasing the web pool thread number might make things worst.

If you don't have a thorough understanding of what you are doing, contact Splunk> support first.

View solution in original post

abonuccelli_spl
Splunk Employee
Splunk Employee

that was also applied and ruled out, did not solve the problem it wasn't about waiting on dns resolution.
in this case a chrome plugin called Revolver used on the tv screens was just flooding splunkweb with too many requests.

0 Karma

abonuccelli_spl
Splunk Employee
Splunk Employee

If for any reasons connections are not closed in time or there are just too many, the webserver in splunk (cherrypie) will just spend most of the time creating/destroying threads.

If your server has enough CPU,memory,network resources, then you might be looking at increase default values for the web threadpool, these are stored in $SPLUNK_HOME/system/default/web.conf


server.thread_pool = 20
server.thread_pool_max = -1
server.thread_pool_min_spare = 5
server.thread_pool_max_spare = 10

You should never edit these in a default file, rather add the properties you want to change in $SPLUNK_HOME/etc/system/local/web.conf, for example:


server.thread_pool = 50
server.thread_pool_min_spare = 10
server.thread_pool_max_spare = 20

These changes should be done only if symptoms very closely match the description above and if enough hardware resources are available as cherrypie is expensive especially in terms of CPU usage.
Ultimately if a large number of clients are not closing connections properly (or firewall in the middle is causing issues with terminating connections gracefully) threads in the web server pool will be starved in any case at some point, and increasing the web pool thread number might make things worst.

If you don't have a thorough understanding of what you are doing, contact Splunk> support first.

MuS
SplunkTrust
SplunkTrust
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...