We're using the Splunk App for AWS, and have been doing some customizations to better suit our needs. I've run into a strange problem though. One of the scripts connects to an Amazon S3 bucket in order to download some billing information, but when splunk runs the script automatically as scheduled, it fails to index any data and gives this error:
get_bill.py: Traceback (most recent call last): get_bill.py: File "/opt/splunk/etc/apps/SplunkAppforAWS/bin/get_bill.py", line 65, in <module> get_bill.py: a = conn.create_bucket(s3bucket1) get_bill.py: File "/opt/splunk/etc/apps/SplunkAppforAWS/bin/boto/s3/connection.py", line 432, in create_bucket get_bill.py: data=data) get_bill.py: File "/opt/splunk/etc/apps/SplunkAppforAWS/bin/boto/s3/connection.py", line 468, in make_request get_bill.py: override_num_retries=override_num_retries) get_bill.py: File "/opt/splunk/etc/apps/SplunkAppforAWS/bin/boto/connection.py", line 910, in make_request get_bill.py: return self._mexe(http_request, sender, override_num_retries) get_bill.py: File "/opt/splunk/etc/apps/SplunkAppforAWS/bin/boto/connection.py", line 872, in _mexe get_bill.py: raise e get_bill.py: socket.error: [Errno 111] Connection refused
But, if I run the script manually on the server, using the following command, it runs perfectly and all the data is printed out:
$SPLUNK_HOME/bin/splunk cmd python $SPLUNK_HOME/etc/apps/SplunkAppforAWS/bin/get_bill.py
Which doesn't make sense to me. Has anyone seen anything like this before?
I know we have been updating our Splunk instances lately, and there have been some network changes that could be affecting this, but I haven't had a chance to see if there is any correlation. I was baffled that when I run the command manually it works, but when splunk tries to do it, it fails.
Could you check which account the splunk runs under? May you can own the Splunk process, May be some access related issue rather than Network.
Splunk runs under the 'splunk' user. I became the 'splunk' user to run the script manually when it succeeded.
This sort of behaviour (batch doesn't work; interactive does) is often a PATH issue. When you log in as Splunk there is no guarantee that your PATH is going to be the same as that available to batch jobs/services, particularly not if you have particular fiddles in your /etc/profile or the Splunk interactive account's .profile or any of the possible Bash rc files.
Set yourself a triggered task to run a shell and output its environment to a file (i.e. "set > set.bg") then do something similar in an interactive shell ("set > set.fg") then diff the two to determine any significant differences in PATH or other environment variables.
Thanks for the suggestion, I had verified that the environments were the same.
Well, something must have gotten messed up during our updates I think. Restarting the box resolved the issue.