Splunk (6.4.2) large cluster.
Splunk Plugin for Jenkins 1.3.1
I have the Splunk plugin on 4 Jenkins masters. One of the masters stopped sending data on Sunday (14 days since last restart of Jenkins) and I can't establish the connection again. The other 3 masters are still working and my curl test to the HTTP collector works from all 4 masters.
Jenkins log entry from working master.
Dec 06, 2016 10:44:01 AM com.splunk.splunkjenkins.utils.LogConsumer run
at com.splunk.splunkjenkins.utils.LogConsumer.run(LogConsumer.java:84)
Jenkins log from master that is not working.
Dec 06, 2016 3:50:06 AM com.splunk.splunkjenkins.utils.LogConsumer run
at com.splunk.splunkjenkins.utils.LogConsumer$1.handleResponse(LogConsumer.java:63)
at com.splunk.splunkjenkins.utils.LogConsumer$1.handleResponse(LogConsumer.java:43)
at com.splunk.splunkjenkins.utils.LogConsumer.run(LogConsumer.java:84)
Looking for something to try without restarting Jenkins (Its a critical production master)
I was able to get SSH access to the system and did the following to solve my problem. Solution taken from the [Removing and Disabling][1] Plugins wiki page.
touch /var/lib/jenkins/plugins/splunk-devops.jpi.disabled
touch /var/lib/jenkins/plugins/splunk-devops-extend.jpi.disabled
Then, I rebooted. From the documentation, I assume that this is possible with any plugin.
You can get more information here! https://wiki.jenkins-ci.org/display/JENKINS/Removing+and+disabling+plugins & Also in Mindmajix.com
The error indicated that http event collector is out of service temporarily, mostly caused by blocked queues, the blocked reason can be found via query
index=_internal blocked
Or via grep
grep blocked $SPLUNK_HOME/var/log/splunk/metrics.log
there is a wiki page for troubleshooting https://wiki.splunk.com/Community:TroubleshootingBlockedQueues
Came in this morning and Jenkins/Splunk logging had stopped again after working OK for about a week on the same master.
java.io.IOException: failed to send data,Service Unavailable
When I test the connection I get:
token:xxxxxxxxxxxxxxxxxxxxxxxx response:Service Unavailable
Still working from other 3 masters.
There should be more verbose error message about the reason, can you lookup the logs for something like
"failed to send data, reason phase".
If the reason is something like "connect reset", can you try increasing the number of "Retries on Error" on Advance section.
if the reason is "service unavailable" and you are using heavy forwarder to forward data to splunk servers across WAN, please adjust maxQueueSize in outputs.conf, see also https://docs.splunk.com/Documentation/Splunk/6.4.2/Admin/Outputsconf.
@hal.boggess do you have chance to check the error details?