Re: What are the pain points with deploying your S... - Page 3

woodcock · ‎03-28-2017

I am growing very tired of being asked to justify my "undocumented" and "bigoted" best-practice of NEVER deploying splunk infrastructure (Search Heads, Indexers, License Manager, Cluster Master, Deployer, Deployment Server, Monitoring Console, etc.) on any Windows OS. I am sure many of you have faced the same challenge. I have created this question so that we can create a canonical list from which we can all share the same URL where the best and brightest of us can share our past pain with the kind intention of helping others avoid the windows path of perfectly-avoidable regret. If you think that you will use this Q&A as a reference point, then please do me-too the question. If you have just cause to avoid Windows then P*L*E*A*S*E post your answer. Remember, friends don't let friends deploy on Windows: let's give them the facts that they need to successfully push back. Please include links to documented disasters when possible. Keep in mind that I probably will never accept any answer to this question (to encourage others to participate in perpetuity). Let's do one objection per answer and vote on the best objections so that the most-important ones will filter to the top.
ATTENTION!!! ATTENTION!!!!
THERE ARE NOW MORE ANSWERS THAN FIT ON A SINGLE PAGE (NOTE PAGINATION CONTROLS AT THE BOTTOM)!

woodcock · ‎03-13-2018

Wow, 3 pages and growing!

martin_mueller · ‎09-02-2017

Running multiple UF instances on one box is a via-support-only fragile affair under Windows, as opposed to just unpacking the tgz multiple times and setting some configs.

http://docs.splunk.com/Documentation/Forwarder/6.6.3/Forwarder/InstallaWindowsuniversalforwarderfrom...

woodcock · ‎09-02-2017

True, and this will be necessary if you are forwarding compressed files because the AEQ (AKA AQ, "Archive Queue") handler is single-threaded and becomes a HUGE bottleneck with even small numbers of *.zip files. I once had ~30 forwarder instances installed on a single UF just to handle *.zip files coming in.

niketn · ‎09-01-2017

Sometimes the problem isn't a bug, rather things are unexpectedly "just different" in Windows. The following is a good Splunk documentation link to get to know differences between Unix and Windows operations:

http://docs.splunk.com/Documentation/Splunk/latest/Admin/DifferencesbetweenunixandwindowsinSplunkope...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

jkat54 · ‎10-24-2017

This was going to be my main point. Note that some regex works different in windows and is undocumented... whitelisting and blacklisting regexes can be an arduous task of trial and error.

woodcock · ‎05-25-2017

If you ever have blocked queues, you may find that your Indexers suddenly refuse to receive data from forwarders requiring the whole Indexer tier to be rebooted (does not happen on Linux Indexers):

https://docs.splunk.com/Documentation/Splunk/6.6.0/Forwarding/Receiverconnection#Receiver_does_not_a...

woodcock · ‎03-29-2017

There is some kind of intractable race-condition between the Windows Splunk service and many logging services such that a standard installation of Splunk can come up in such a state that events cannot be forwarded without experiencing corruption. The work-around is to delay the start of the Splunk service but even this does not always prevent the problem (although it usually does). Keep in mind that you need to monitor the OS on your Splunk infrastructure, too, so problems forwarding in security logs there are big problems. See here:

https://answers.splunk.com/answers/200924/formatmessage-error-appears-in-indexed-message-for.html

This makes Windows a risky option for Heavy Forwarder or Syslog+UF.

mattymo · ‎09-02-2017

This one comes up a lot during patching cycles!

- MattyMo

woodcock · ‎01-15-2019

That's when I pull out my I told you so card.

lycollicott · ‎03-29-2017

Windows permissions and file ownership, particularly on on indexers. I have had too many Bucketmover inflight errors, because either LocalSystem or an MSA could not create, delete or rename folders. There are workarounds and you can routinely icacl.exe it, but who has ever had to cron chmod or chown commands on their *NIX indexers? No one.

woodcock · ‎03-28-2017

Most of the splunk documentation (and especially the training documentation) is *NIX-focused. Things are much better now but even so, in most classes that I attended (even last year) there was somebody on Windows whose cut-and-paste would not work because it was wrong. This is obvious because the instructors chat heads-up warnings about these problem to everyone.

Richfez · ‎03-28-2017

The python interpreter in Windows is far slower than that in the Linuxes. I believe it mainly affects Enterprise Security - but nothing is broken just a lot of things take longer to run.

jacobpevans · ‎03-28-2020

Possibly related, our environment is virtually unable to run Qmulos - Compliance on Windows Server 2016. Clicking any "Submit"-style button takes 10-20 seconds to load regardless of the button's function. It is now an officially recognized bug. The problem is not on any other Windows OS versions.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

JRAnderson · ‎03-28-2020

Thanks, just another push for Linux.

woodcock · ‎03-28-2017

Most Splunk admins that I know have had many cases where Splunk Indexers and Search Heads have crashed due to memory leaks in the OS. I have NEVER seen this happen in *NIX (although I am sure that on rare occasion it has). Many *.0 releases of Splunk on Windows have contained a memory leak that made it through testing, but not *NIX.

woodcock · ‎12-07-2018

I have now had this happen several times in NIX in the 7.* releases (shame on splunk for not doing regression/capacity testing with bounds-checking).

Richfez · ‎01-16-2019

If we get to the point where this has stabilized, I think it would behoove us to specify a reasonable range of releases where this was a problem (IIRC, 7.2.0 throught 7.2.3?). Just so we know it's not an ongoing problem.

Richfez · ‎03-28-2017

All but one - I ran a single SH/Indexer box on Windows for years, from version 4.3 to - Oh, I may have even skipped 5.x entirely! - 6.0. I had no significant problems (possibly really none at all - I can't remember in that much detail!, but certainly nothing serious).

woodcock · ‎03-28-2017

True, I do know you, @rich7177!

Richfez · ‎03-28-2017

Yes. 🙂

Note I'm STILL not recommending Splunk on Windows* , just saying I had no problems in several years of running Splunk on Windows.

When might I suggest windows?

Small shop running Splunk Free with no use for more than 10 or 20 GB/day of license because they simply don't have that much stuff going on, then a Windows all-in-one box would probably be fine.

Small shops that have no Linux experience. Again, with a maybe up to 50 GB/day limit and no replication requirements.

Places with no real IT people, just a guy in the place that can take care of the few day to day things....

Wait, I see the common thread - Very small places (data-wise) with little to no Linux experience.

woodcock · ‎03-28-2017

It is still a Windows "best practice" to have a monthly reboot (if not more frequently). I have seen Linux indexers that have an uptime of YEARS. Who can afford monthly Indexer downtime just so that the host OS doesn't crash?

What are the pain points with deploying your Splunk architecture on Windows OS?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation