After a recent
deploy-server reload, all of my Splunk_TA_Windows clients except for 5 started showing up with the following client errors:
11-12-2015 14:43:24.037 -0800 WARN ClientSessionsManager - ip=10.x.x.x name=946A6046-907B-408A-8887-77350DA2A96C Updating record for sc=Windows app=Splunk_TA_windows: action=Install result=Fail checksum=331432938900507147
The 5 that work are Exchange servers, run by a different system admin, but I can't find anything significantly different. We cleared this up before with newly deployed Universal Forwarders by deleting the Splunk_TA_Windows directory on the client machines and bouncing the UF, but I don't want to have to do this to 42 servers!
When I did my deploy-server reload, I was making changes in another app, only to set a TZ variable in a "local" props.conf for a particular sourcetype.
Does anyone have any clues about how to best fix this error permanently?
We had the same issue and resolved it. The broken clients were all trying to download two versions of an app.
I have Splunk_TA_Squid_SiteA and Splunk_TA_Squid_SiteB with competing configurations for the same Squid logs.
I didn't realize Server Class SiteA Include was including my SiteB clients. My SiteB clients were downloading both SiteA Squid and SiteB Squid.
Once I updated Server Class Site A to only apply to Site A clients and not Site B clients, my Site B clients stopped generating the CheckSum errors.
This will happen if the DC splunk cannot overwrite files in the target app (because it does not have permissions) OR if you have to the DS only to deploy some pieces of an app that came with your splunk installation and you have modified a file in the
default directory (don't do that).
I faced the same issue for some other app on linux, when someone started forwarder with root user and then started it again with splunk user, the workaround is to create a new serverclass (with new name) and add the apps and clients to it.
Check to see if the app being deployed via the deployment server contains an install_source_checksum in its app.conf file. If it does, try removing that from the client and the deployment server and applying the change again.
So, I don't know if I resolved it, but I finally got myself access to a server with this problem. What I found in the c:\program files\splunkuniversalforwarder\var\log\splunk\splunkd.log was the following:
DeployedApplication - app=Splunk_TA_windows was already installed via search head cluster deployer, UI, CLI, or REST API; it may not be overridden via deployment server; remove existing app=Splunk_TA_windows via search head cluster deployer, UI, CLI, or REST API if you wish to install it via deployment server 03-29-2016 14:41:07.542 -0700 ERROR
So, perhaps this is an artifact of my Deployment Server being the same server as an Indexer? Something I will fix in the future. I guess each time I update in the meantime, I have to have admins remove the directory so it deploys. Boo.
FYI. Had this same issue but with the Splunk_TA_oracle on a linux box. Physically deleting the app on the client server and restarting the client did the trick. This was a search head so the issue is not limited to the UF. This was on 6.3.0.
I still haven't been able to fix this. I don't have spaces in the server class name. And, I don't see where @ssievert sees the server class name as "Global", it's clearly "Windows", and I did just double check that it doesn't have a space at the end either.
Most recently, I used the GUI to Uninstall the App, then I deleted $SPLUNK_HOME/etc/deployment-apps/Splunk_TA_Windows and re-copied it from $SPLUNK_HOME/etc/apps/, restarted the server, added the Windows Serverclass back to the app, set it to restart Splunkd on the clients, and it STILL comes back with Checksum failed. Still looking for clues...
As I said and @mikaelbje linked to, I figured out the deletion of Splunk_TA_windows on the client and restarting the forwarder a while back, when it was just a couple servers that I was installing fresh on using the command line. It worked for a little while, then the error came back when I upgraded the app on the server and went to redeploy the client. I see yet another version has been deployed, so when I have time in January I guess I'll try to have my admins delete and re-install all the clients again. I seem to be getting some data still, but the errors continue.
I'm having the same issue here. The initial push of Splunk_TA_windows worked great. I edited the inputs.conf file, ran
splunk reload deploy-server, now I'm receiving the same error as @craigkleen. Tried deleting the app on the UF, then ran
splunk reload deploy-server again, and receive the same error again. Is there another work around for this or will this be fixed in the next release?