I have a Python scripted input on a Splunk UF which calls a Kafka bin script (bin/kafka-consumer-groups.sh) and re-formats the output into Splunk-friendly key=value format.
Sometimes a broker is unavailable and the Kafka bin script throws a WARN message about it.
After this "failed script run", the Splunk UF stops calling the Python scripted input until I restart the Splunk UF.
I turned on category.ExecProcessor=DEBUG in log-local.cfg and this what I see:
Normal process run
09-26-2016 17:31:56.559 -0700 DEBUG ExecProcessor - ExecProcessorSharedState::addToRunQueue() path='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' restartTimerIfNeeded=1
09-26-2016 17:31:56.559 -0700 DEBUG ExecProcessor - adding "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py" to runqueue
09-26-2016 17:31:56.559 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Added to run queue
09-26-2016 17:31:58.397 -0700 DEBUG ExecProcessor - Running: python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py on PipelineSet 0
09-26-2016 17:31:58.397 -0700 DEBUG ExecProcessor - PipelineSet 0: Created new ExecedCommandPipe for "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py", uniqueId=18390
09-26-2016 17:32:08.260 -0700 DEBUG ExecProcessor - PipelineSet 0: Got EOF from "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py", uniqueId=18390
09-26-2016 17:32:08.260 -0700 DEBUG ExecProcessor - PipelineSet 0: Sending 'done' key for "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py", uniqueId=18390
09-26-2016 17:32:08.260 -0700 DEBUG ExecProcessor - PipelineSet 0: Ran script: python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py, took 9.863047 seconds to run, 10060 bytes read
09-26-2016 17:32:08.260 -0700 DEBUG ExecProcessor - PipelineSet 0: Destroying ExecedCommandPipe for "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py" id=18390
09-26-2016 17:32:08.260 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Not added to run queue
Process run with "WARN" message and then Splunk UF never runs the script again
09-26-2016 17:32:56.602 -0700 DEBUG ExecProcessor - ExecProcessorSharedState::addToRunQueue() path='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' restartTimerIfNeeded=1
09-26-2016 17:32:56.602 -0700 DEBUG ExecProcessor - adding "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py" to runqueue
09-26-2016 17:32:56.602 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Added to run queue
09-26-2016 17:32:58.444 -0700 DEBUG ExecProcessor - Running: python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py on PipelineSet 0
09-26-2016 17:32:58.445 -0700 DEBUG ExecProcessor - PipelineSet 0: Created new ExecedCommandPipe for "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py", uniqueId=18393
09-26-2016 17:33:06.086 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py" [2016-09-26 17:33:06,084] WARN Bootstrap broker broker11daysm.testdomain.com:7071 disconnected (org.apache.kafka.clients.NetworkClient)
09-26-2016 17:33:56.638 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Not added to run queue
09-26-2016 17:34:56.674 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Not added to run queue
09-26-2016 17:35:56.700 -0700 DEBUG ExecProcessor - cmd='python /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/bin/kafka-new-consumer-groups.py' Not added to run queue
etc...
I never receive another "ExecProcessorSharedState::addToRunQueue()" message and have to restart Splunk UF to resolve the "hung" state of this Scripted Input.
While Splunk UF is in this state, I can manually run the kafka-new-consumer-groups.py script without any issues.
I don't know why Splunk UF would care about the WARN message and why it wouldn't add the Scripted Input back to its RunQueue on the next scheduled run.
Any ideas?
I was unable to find a solution to this so I abandoned trying to resolve it with the Splunk "script" inputs.conf stanza.
We got around the problem by calling the script through a cron task (more accurately, via Nagios), dumping to an intermediate file, and then sending that file to the Splunk UF REST API "oneshot" service.
curl -k -u admin:changeme -d "name=/usr/local/nagios/tmp/consumerlag.perf" -d "index=${ENV}_kafkaconsumer" -d "sourcetype=consumer_metrics" -d "rename-source=/usr/local/nagios/libexec/keyn-consumerlag.sh" https://localhost:8089/services/data/inputs/oneshot > /usr/local/nagios/tmp/curl.out 2>&1
(I know we could send it directly to the Splunk indexer but then we lose the local queueing and indexer load balancing that we achieve by sending the script output to the local UF REST API.)
I was unable to find a solution to this so I abandoned trying to resolve it with the Splunk "script" inputs.conf stanza.
We got around the problem by calling the script through a cron task (more accurately, via Nagios), dumping to an intermediate file, and then sending that file to the Splunk UF REST API "oneshot" service.
curl -k -u admin:changeme -d "name=/usr/local/nagios/tmp/consumerlag.perf" -d "index=${ENV}_kafkaconsumer" -d "sourcetype=consumer_metrics" -d "rename-source=/usr/local/nagios/libexec/keyn-consumerlag.sh" https://localhost:8089/services/data/inputs/oneshot > /usr/local/nagios/tmp/curl.out 2>&1
(I know we could send it directly to the Splunk indexer but then we lose the local queueing and indexer load balancing that we achieve by sending the script output to the local UF REST API.)