Deployment Architecture

Slave-apps folder is always empty

Mahieu
Communicator

Hi guys,

I'm working on deploying a cluster and i have a little problem.
Everything's ok on the "connectivity" side between my master instance and my three "slave" indexers.

The only thing that isn't working properly is pushing my master apps to my slave apps folders on the indexers.

I have created a sample indexes.conf on the master instance in :
/opt/splunk_master/etc/master-apps/myapp/local/indexes.conf

I have launched splunk apply cluster-bundle and then checked that everything was ok with splunk show cluster-bundle-status.

On the indexers, I can see that /opt/splunk/var/run/splunk/cluster/remote-bundle//myapp does exist. Still, it won't come up into /opt/splunk/etc/slave-apps.

I've checked splunkd logs and I can see something like this :

09-18-2013 15:38:17.176 +0200 ERROR CMSlave - Could not move /opt/splunk/var/run/splunk/cluster/remote-bundle/e6caf729df0cddfc030dedae58eb8a63-1379511473/apps/default_ftpub to /opt/splunk/etc/slave-apps/default_ftpub

and then :

9-18-2013 15:38:17.177 +0200 ERROR CMSlave - Failed to move bundle to slave-apps

I tried performing a manual move from /opt/splunk/var/run/splunk/cluster/remote-bundle//myapp to /opt/splunk/etc/slave-apps and it works (being logged either as Splunk or as root).

I don't know what's going on here.

Any thoughts ?

Thanks in advance for your help.

Mat

PS : I'm not using a deployment server at all.

Tags (2)

Mahieu
Communicator

I couldn't solve this so I went for a fresh new install and everything's fine now.
We were using different filesystems and i'm not sure all of these were mounted correctly.
My sys admin seemed confused but there's not problem at all now that we got rid of all symbolic links and stuff.

Looks like the issue is absolutely not Splunk related.

Mat

0 Karma

sowings
Splunk Employee
Splunk Employee

I would begin by checking file ownership and permissions. You mentioned "bing logged either as Splunk or as root"; I suspect that if your runtime user is 'splunk', there's a file or directory somewhere still owned by root that's blocking the installation of new slave-apps.

0 Karma

Mahieu
Communicator

Yes.
I couldn't solve this so I went for a fresh new install and everything's fine now.
We were using different filesystems and i'm not sure all of these were mounted correctly.
My sys admin seemed confused but there's not problem at all now that we got rid of all symbolic links and stuff.

Thanks for your help sowings.

0 Karma

sowings
Splunk Employee
Splunk Employee

The Splunk user couldn't, and that was the complaint from the log, no?

0 Karma

Mahieu
Communicator

Still, how could the splunk user move a bundle created by root and with the following rights ?
drwx------ 4 root root 4096 Sep 18 18:51 250b9a3088043742da0eb3992c987307-1379523088

0 Karma

Mahieu
Communicator

Splunk is owned by root, on the master + search head and on the slave indexers.

I think the problem is linked to a different filesystem mapped on /opt/splunk/var ... I'll try and change that, see if it makes any difference.

0 Karma

Mahieu
Communicator

I have disabled selinux already, i'll be looking for other packages tomorrow.

I'll let you know when i have more information, it's time to have dinner and go to bed in Europe already 😉

Thanks a lot for you help today.

0 Karma

sowings
Splunk Employee
Splunk Employee

No, the OS should not be a problem. A package running on the server (or something like SELinux) might be affecting permissions.

I wouldn't try to shortcut the master-apps / slave-apps process, though. That's critical for the master to know that the indexer has the right config to be able to consider it a valid member of the cluster.

0 Karma

Mahieu
Communicator

Hum, i'll have to double check who's the owner of splunkd.

Another idea ... could the OS (Fedora 19) be a problem ? Or any weird package running on the servers ? I've tried to get rid of selinux firewalld and stuff like that but ... there might be other components to take into account.

I think i might perform a clean install of Splunk again. I'm getting really confused here and have no idea why i'm getting this error.

On the other end ... using the master / slave apps process isn't really critical. I have three slave indexers so I could just copy paste the files when i have to ...

0 Karma

sowings
Splunk Employee
Splunk Employee

Those permissions look fine. The fact that the bundles are arriving at the indexer owned by root, however, now suggests that the Splunk process is running as root on the indexer. Further, you might have a restrictive umask in play, but that's not the critical issue.

Please check the owner of the 'splunkd' process on the indexer; I suspect that it is running as root.

0 Karma

Mahieu
Communicator

Splunk has been started as root manually.

Also, on the master instance were Splunk was started as root too, here's what i get when i "ll" the directories one by one :
drwxr-xr-x. 5 root root 4096 18 sept. 10:07 opt
drwxr-xr-x 9 splunk splunk 4096 18 sept. 15:35 splunk_master
drwxr-xr-x 15 splunk splunk 4096 18 sept. 10:19 etc
drwxrwxrwx 3 splunk splunk 4096 18 sept. 17:03 master-apps
drwxrwxrwx 4 splunk splunk 4096 18 sept. 12:27 default_ftpub
drwxrwxrwx 2 splunk splunk 4096 18 sept. 18:50 local
-rwxrwxrwx 1 splunk splunk 381 18 sept. 18:50 indexes.conf

Anything wrong in there ?

0 Karma

sowings
Splunk Employee
Splunk Employee

Also, how are you starting Splunk? "splunk start" from the command line, or did you set up boot start with splunk enable boot-start --user <other_user>?

0 Karma

sowings
Splunk Employee
Splunk Employee

It sounds, then, like the source of these configs--the cluster master--is the one that has the bad configs. Can you check the contents of the master-apps folder on the master?

0 Karma

Mahieu
Communicator

It doesn't look good. Here it goes (bundle id has changed since last time) :

total 52
drwx------ 4 root root 4096 Sep 18 18:21 21f00b93d35dce596a34a98fe8b9e952-1379521303
-rw------- 1 root root 10240 Sep 18 18:21 21f00b93d35dce596a34a98fe8b9e952-1379521303.bundle
-rw------- 1 root root 10240 Sep 18 18:21 37fxxxxxxx.bundle
-rw------- 1 root root 10240 Sep 18 18:21 af2xxxxxxx.bundle
-rw------- 1 root root 10240 Sep 18 18:21 f21c8f9dbad9409bba1e5ea2cafa2621-1379521274.bundle
ls: cannot open directory ./21f00b93d35dce596a34a98fe8b9e952-1379521303: Permission denied

0 Karma

sowings
Splunk Employee
Splunk Employee

I suspect, then, that there's a file in there, which is root-owned, and therefore can't be removed by the runtime user. Try ls -lR to check perms and ownership on the files in that bundle directory (the 585c15ce10fa204cb7e48dd6330ba68c-1379519945 one).

0 Karma

Mahieu
Communicator

Actually ...

When logged in as user splunk in /opt/splunk/var/run/splunk/cluster/remote-bundle :

mv 585c15ce10fa204cb7e48dd6330ba68c-1379519945 /opt/splunk/etc/slave-apps/
mv: inter-device move failed: ‘585c15ce10fa204cb7e48dd6330ba68c-1379519945’ to ‘/opt/splunk/etc/slave-apps/585c15ce10fa204cb7e48dd6330ba68c-1379519945’; unable to remove target: Is a directory

0 Karma

Mahieu
Communicator

I can cd from / to /opt/splunk/etc/slave-apps but there's nothing after that 😕

In etc, i an "ll" and here's what i get for slave-apps :

drwxr-xr-x 2 splunk splunk 4096 18 sept. 17:04 slave-apps

Everything looks normal for me but ... i'm not sure i've been looking at the right place.

Is this what you were asking ?

0 Karma

sowings
Splunk Employee
Splunk Employee

Yes, but for other reasons. 🙂

I'd check permissions on the intervening paths, particularly with an eye to the "other" permissions. The target location is listed as /opt/splunk/etc/slave-apps/default_ftpub from your log events, so check /opt, /opt/splunk, /opt/splunk/etc, etc for at least "execute" permission on the directory. The Splunk user will have to "cd through" each and every directory in the path, and that's governed by x permission for either user or group (or other) of the running user. It doesn't matter if the target dir is 777, if it can't cd there....

0 Karma

Mahieu
Communicator

Is it a problem if it's owned by root but chmoded to 777 ?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...