We have a small satellite deployment of 40+ servers, that have a dedicated HF doubling as a Deployment Server running on Linux. Equal mix of Windows and Linux. 24h ago discovered that a few of the Windows servers were now reporting that they no longer had the Windows_TA installed, but instead were running the Linux_TA. Checking the UF hosts directly, they in fact were running the Windows_TA even though the DS was reporting they were running the Linux_TA??
After a day of trying to figure out how (validated filters, tested, removed and readded all Server Classes, and Apps), it continued. Noticed throughout the day a few more were now reporting this "mix-up", and again validated those reporting Linux_TA were running Windows_TA. As a final drastic measure, removed Splunk from the host (the HF/DS, not the UF's), reinstalled from scratch, and created the environment new. Made sure the UF's were not running any of the distributed apps/ta's. Built new Apps, Server Class. The UF's started phoning home, and once again, the Windows servers were reporting the Linux_TA, but running the Windows_TA
I have this exact same issue, i can prove that it works fine matching the linux os but does not match windows.
in fact windows will get matched as linux
Splunk claims this was fixed in 9.2.2, and is listed in the "fixed issues" for this version. I wish I could confirm, but as of 9.2.2 my DS struggles to render any page in "Forwarder Management". Support is struggling to determine cause for 3+ months now
Yeah, you have a same issue as me, Our Deployment start lagging for any function that need to call API for UFs phonehome information. Call Support and they confirm as a "bug" and will be fix at 9.4.
I updated to 9.3.1 recently, no more "wrong apps" by still very lagging. I need to run commnd reload deploy-server each time want to deploy some TA to our agent.
Has been officially registered as a bug. No ETA on fix
Splunk support concluded it was an "as yet discovered software bug"
I have nothing to add, except to say that I have observed the same bug, where the server classes that use machine filtering display the incorrect clients in the UI.
The bug remains in version v.9.2.1
I believe that this bug is planned to be fixed in 9.2.2
Would you mind sharing the serverclass.conf file?
Pretty simple....
serverClass:All:app:all_outputs]
restartSplunkWeb = 0
restartSplunkd = 1
stateOnClient = enabled
[serverClass:All]
whitelist.0 = *
[serverClass:Windows:app:Splunk_TA_windows]
restartSplunkWeb = 0
restartSplunkd = 1
stateOnClient = enabled
[serverClass:Linux:app:Splunk_TA_nix]
restartSplunkWeb = 0
restartSplunkd = 1
stateOnClient = enabled
[serverClass:All:app:all_deploymentclient]
restartSplunkWeb = 0
restartSplunkd = 1
stateOnClient = enabled
[serverClass:Linux]
machineTypesFilter = linux-x86_64
whitelist.0 = *
[serverClass:Windows]
machineTypesFilter = windows-x64
whitelist.0 = *
Based on docs this should works. BUT on example part those platform selections have done on app not serverclass level. Maybe you should try that?
Btw have you configured this by gui or manually with text editor?
Such a small and straightforward environment I used GUI. I get the sense there is a bug in 9.2.0.x
Seems fairly simple / basic configurations. I would suggest raising Support case to get this troubleshot and fixed.
@isoutamo thoughts?
I opened a P2 3 days ago... still waiting. Typical
Hello @tlmayes, How are you whitelisting the hosts? Do you just want to use this nice feature of filtering everything by the OS type? Screenshot below -
With the above way, you can create 2 separate server classes for Windows and Linux and whitelist all the hosts.
Please accept the solution and hit Karma, if this helps!
From what *I* have seen, the machineTypesFilter seems to be at the root of this bug.
This is the absolute *WORST* update that I've seen in the last 6 years that I've been working with Splunk.
I did read something that would indicate that the (white|black)list.X can also take OS Strings, but the docs call it "platform dependent", so I am putting it off until we can actually SEE what's being deployed again.
I have noticed yet *ANOTHER* bug... After a Deployment Server has been running for a bit, ANY CALL that would query DS Client information will TIMEOUT.
I have multiple scripts that read data from /services/deployment/server/clients, and I've bumped the timeouts to 30 seconds, and it still times-out. It used to take < 2s to pull data from THOUSANDS of clients.
Yes, filtering by OS. Rebuilt the DS from scratch, set filters (using the OS filter). All Linux servers receive the Linux TA. All Windows Servers receive the Linux TA, and confirmed the OS filter, again 😕