Hi All, I am currently facing an issue with some of the remote host machine not getting a customized app. Yesterday I had made some changes to inputs.conf and pushed the changes from the centralized repository to Deployment server instance and from DP instance, I had executed the splunk-reload deploy-server command to push the changes to all the remote windows machine. We have almost 1000 + nodes with UF agent installed, out of which 20 nodes are having an issue in getting the customized app.
From Forward management console, I could see the below errors are populated for these 20 nodes.
"index=_internal sourcetype=splunkd record (New OR Updating) name=8DD546C9-C911-4B37-8227-C0182473C895 result=Fail | head 100"
UF agent version is 6.6.1
Splunk version is 6.6.1
Only the modified app test-ia-windows is not getting pushed to this 20 node, but could see other apps communicating with Deployment server.
Kindly guide me how to fix this issue.
I have seen this before on some versions of the UF. Some do not like the code in one of my apps.
Aside from performing an upgrade of those UF's, you might isolate the failing hosts in a different server class with the same apps, remove the app from that server class. Then, either create an empty app (just a folder skeleton with maybe an empty /local/inputs.conf file and name it _force_splunk_recycle) and set that app to restart splunk, or have your custom app force a restart (but that might restart the service 1000 windows boxes) . Then deploy the app back to that server class along with the _force_splunk_recycle app you made.
Basically this just removes the app, puts it back, and forces splunkd to restart on the UF's.
Sometimes this works for me, sometimes I have to RDP and reinstall Splunk.
Hi Duke, thanks for your effort on this, instead of mapping/remapping of the server class, can we re-install the splunk agent for the 22 odd servers.
Kindly let me know, by re-installing the UF agent will fix this issue.
thanks in advance.
I would stop the UF on one system first, delete the app from the ../etc/apps folder, start the service and let the app redeploy. If that fails, I would upgrade or reinstall the UF software. Then lastly I would uninstall/reinstall.
It stands to reason that the app is not the problem or it would not have deployed successfully to other hosts.. But maybe check the splunkd log these UF's for complaints about the app if it still fails.
I can't guarantee anything. I'm just citing personal experience with picky UF's and deploying apps. We even had some linux UF's that wouldn't take the SPLUNK_TA_NIX app until the partial app was removed and the service restarted.
Hi esix, we are using linux based os for deployment server. Currently we are facing an issue with 22 nodes only rest of the nodes got the required app. Kindly guide why we are getting this above mentioned error only for 22 nodes and how to fix that out.
thanks in advance.