Hi guys,
I'm working on deploying a cluster and i have a little problem.
Everything's ok on the "connectivity" side between my master instance and my three "slave" indexers.
The only thing that isn't working properly is pushing my master apps to my slave apps folders on the indexers.
I have created a sample indexes.conf on the master instance in :
/opt/splunk_master/etc/master-apps/myapp/local/indexes.conf
I have launched splunk apply cluster-bundle and then checked that everything was ok with splunk show cluster-bundle-status.
On the indexers, I can see that /opt/splunk/var/run/splunk/cluster/remote-bundle/
I've checked splunkd logs and I can see something like this :
09-18-2013 15:38:17.176 +0200 ERROR CMSlave - Could not move /opt/splunk/var/run/splunk/cluster/remote-bundle/e6caf729df0cddfc030dedae58eb8a63-1379511473/apps/default_ftpub to /opt/splunk/etc/slave-apps/default_ftpub
and then :
9-18-2013 15:38:17.177 +0200 ERROR CMSlave - Failed to move bundle to slave-apps
I tried performing a manual move from /opt/splunk/var/run/splunk/cluster/remote-bundle/
I don't know what's going on here.
Any thoughts ?
Thanks in advance for your help.
Mat
PS : I'm not using a deployment server at all.
I couldn't solve this so I went for a fresh new install and everything's fine now.
We were using different filesystems and i'm not sure all of these were mounted correctly.
My sys admin seemed confused but there's not problem at all now that we got rid of all symbolic links and stuff.
Looks like the issue is absolutely not Splunk related.
Mat
I would begin by checking file ownership and permissions. You mentioned "bing logged either as Splunk or as root"; I suspect that if your runtime user is 'splunk', there's a file or directory somewhere still owned by root that's blocking the installation of new slave-apps.
Yes.
I couldn't solve this so I went for a fresh new install and everything's fine now.
We were using different filesystems and i'm not sure all of these were mounted correctly.
My sys admin seemed confused but there's not problem at all now that we got rid of all symbolic links and stuff.
Thanks for your help sowings.
The Splunk user couldn't, and that was the complaint from the log, no?
Still, how could the splunk user move a bundle created by root and with the following rights ?
drwx------ 4 root root 4096 Sep 18 18:51 250b9a3088043742da0eb3992c987307-1379523088
Splunk is owned by root, on the master + search head and on the slave indexers.
I think the problem is linked to a different filesystem mapped on /opt/splunk/var ... I'll try and change that, see if it makes any difference.
I have disabled selinux already, i'll be looking for other packages tomorrow.
I'll let you know when i have more information, it's time to have dinner and go to bed in Europe already 😉
Thanks a lot for you help today.
No, the OS should not be a problem. A package running on the server (or something like SELinux) might be affecting permissions.
I wouldn't try to shortcut the master-apps / slave-apps process, though. That's critical for the master to know that the indexer has the right config to be able to consider it a valid member of the cluster.
Hum, i'll have to double check who's the owner of splunkd.
Another idea ... could the OS (Fedora 19) be a problem ? Or any weird package running on the servers ? I've tried to get rid of selinux firewalld and stuff like that but ... there might be other components to take into account.
I think i might perform a clean install of Splunk again. I'm getting really confused here and have no idea why i'm getting this error.
On the other end ... using the master / slave apps process isn't really critical. I have three slave indexers so I could just copy paste the files when i have to ...
Those permissions look fine. The fact that the bundles are arriving at the indexer owned by root, however, now suggests that the Splunk process is running as root on the indexer. Further, you might have a restrictive umask in play, but that's not the critical issue.
Please check the owner of the 'splunkd' process on the indexer; I suspect that it is running as root.
Splunk has been started as root manually.
Also, on the master instance were Splunk was started as root too, here's what i get when i "ll" the directories one by one :
drwxr-xr-x. 5 root root 4096 18 sept. 10:07 opt
drwxr-xr-x 9 splunk splunk 4096 18 sept. 15:35 splunk_master
drwxr-xr-x 15 splunk splunk 4096 18 sept. 10:19 etc
drwxrwxrwx 3 splunk splunk 4096 18 sept. 17:03 master-apps
drwxrwxrwx 4 splunk splunk 4096 18 sept. 12:27 default_ftpub
drwxrwxrwx 2 splunk splunk 4096 18 sept. 18:50 local
-rwxrwxrwx 1 splunk splunk 381 18 sept. 18:50 indexes.conf
Anything wrong in there ?
Also, how are you starting Splunk? "splunk start" from the command line, or did you set up boot start with splunk enable boot-start --user <other_user>
?
It sounds, then, like the source of these configs--the cluster master--is the one that has the bad configs. Can you check the contents of the master-apps folder on the master?
It doesn't look good. Here it goes (bundle id has changed since last time) :
total 52
drwx------ 4 root root 4096 Sep 18 18:21 21f00b93d35dce596a34a98fe8b9e952-1379521303
-rw------- 1 root root 10240 Sep 18 18:21 21f00b93d35dce596a34a98fe8b9e952-1379521303.bundle
-rw------- 1 root root 10240 Sep 18 18:21 37fxxxxxxx.bundle
-rw------- 1 root root 10240 Sep 18 18:21 af2xxxxxxx.bundle
-rw------- 1 root root 10240 Sep 18 18:21 f21c8f9dbad9409bba1e5ea2cafa2621-1379521274.bundle
ls: cannot open directory ./21f00b93d35dce596a34a98fe8b9e952-1379521303: Permission denied
I suspect, then, that there's a file in there, which is root-owned, and therefore can't be removed by the runtime user. Try ls -lR to check perms and ownership on the files in that bundle directory (the 585c15ce10fa204cb7e48dd6330ba68c-1379519945
one).
Actually ...
When logged in as user splunk in /opt/splunk/var/run/splunk/cluster/remote-bundle :
mv 585c15ce10fa204cb7e48dd6330ba68c-1379519945 /opt/splunk/etc/slave-apps/
mv: inter-device move failed: ‘585c15ce10fa204cb7e48dd6330ba68c-1379519945’ to ‘/opt/splunk/etc/slave-apps/585c15ce10fa204cb7e48dd6330ba68c-1379519945’; unable to remove target: Is a directory
I can cd from / to /opt/splunk/etc/slave-apps but there's nothing after that 😕
In etc, i an "ll" and here's what i get for slave-apps :
drwxr-xr-x 2 splunk splunk 4096 18 sept. 17:04 slave-apps
Everything looks normal for me but ... i'm not sure i've been looking at the right place.
Is this what you were asking ?
Yes, but for other reasons. 🙂
I'd check permissions on the intervening paths, particularly with an eye to the "other" permissions. The target location is listed as /opt/splunk/etc/slave-apps/default_ftpub
from your log events, so check /opt, /opt/splunk, /opt/splunk/etc, etc for at least "execute" permission on the directory. The Splunk user will have to "cd through" each and every directory in the path, and that's governed by x permission for either user or group (or other) of the running user. It doesn't matter if the target dir is 777, if it can't cd there....
Is it a problem if it's owned by root but chmoded to 777 ?