Getting Data In

What are the pain points with deploying your Splunk architecture on Windows OS?

woodcock
Esteemed Legend

I am growing very tired of being asked to justify my "undocumented" and "bigoted" best-practice of NEVER deploying splunk infrastructure (Search Heads, Indexers, License Manager, Cluster Master, Deployer, Deployment Server, Monitoring Console, etc.) on any Windows OS. I am sure many of you have faced the same challenge. I have created this question so that we can create a canonical list from which we can all share the same URL where the best and brightest of us can share our past pain with the kind intention of helping others avoid the windows path of perfectly-avoidable regret. If you think that you will use this Q&A as a reference point, then please do me-too the question. If you have just cause to avoid Windows then P*L*E*A*S*E post your answer. Remember, friends don't let friends deploy on Windows: let's give them the facts that they need to successfully push back. Please include links to documented disasters when possible. Keep in mind that I probably will never accept any answer to this question (to encourage others to participate in perpetuity). Let's do one objection per answer and vote on the best objections so that the most-important ones will filter to the top.
ATTENTION!!! ATTENTION!!!!
THERE ARE NOW MORE ANSWERS THAN FIT ON A SINGLE PAGE (NOTE PAGINATION CONTROLS AT THE BOTTOM)!

jkat54
SplunkTrust
SplunkTrust

This was going to be my main point. Note that some regex works different in windows and is undocumented... whitelisting and blacklisting regexes can be an arduous task of trial and error.

0 Karma

woodcock
Esteemed Legend

If you ever have blocked queues, you may find that your Indexers suddenly refuse to receive data from forwarders requiring the whole Indexer tier to be rebooted (does not happen on Linux Indexers):

https://docs.splunk.com/Documentation/Splunk/6.6.0/Forwarding/Receiverconnection#Receiver_does_not_a...

0 Karma

woodcock
Esteemed Legend

There is some kind of intractable race-condition between the Windows Splunk service and many logging services such that a standard installation of Splunk can come up in such a state that events cannot be forwarded without experiencing corruption. The work-around is to delay the start of the Splunk service but even this does not always prevent the problem (although it usually does). Keep in mind that you need to monitor the OS on your Splunk infrastructure, too, so problems forwarding in security logs there are big problems. See here:

https://answers.splunk.com/answers/200924/formatmessage-error-appears-in-indexed-message-for.html

This makes Windows a risky option for Heavy Forwarder or Syslog+UF.

mattymo
Splunk Employee
Splunk Employee

This one comes up a lot during patching cycles!

0 Karma

woodcock
Esteemed Legend

That's when I pull out my I told you so card.

0 Karma

lycollicott
Motivator

Windows permissions and file ownership, particularly on on indexers. I have had too many Bucketmover inflight errors, because either LocalSystem or an MSA could not create, delete or rename folders. There are workarounds and you can routinely icacl.exe it, but who has ever had to cron chmod or chown commands on their *NIX indexers? No one.

woodcock
Esteemed Legend

Most of the splunk documentation (and especially the training documentation) is *NIX-focused. Things are much better now but even so, in most classes that I attended (even last year) there was somebody on Windows whose cut-and-paste would not work because it was wrong. This is obvious because the instructors chat heads-up warnings about these problem to everyone.

Richfez
SplunkTrust
SplunkTrust

The python interpreter in Windows is far slower than that in the Linuxes. I believe it mainly affects Enterprise Security - but nothing is broken just a lot of things take longer to run.

jacobpevans
Motivator

Possibly related, our environment is virtually unable to run Qmulos - Compliance on Windows Server 2016. Clicking any "Submit"-style button takes 10-20 seconds to load regardless of the button's function. It is now an officially recognized bug. The problem is not on any other Windows OS versions.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

JRAnderson
Explorer

Thanks, just another push for Linux.

0 Karma

woodcock
Esteemed Legend

Most Splunk admins that I know have had many cases where Splunk Indexers and Search Heads have crashed due to memory leaks in the OS. I have NEVER seen this happen in *NIX (although I am sure that on rare occasion it has). Many *.0 releases of Splunk on Windows have contained a memory leak that made it through testing, but not *NIX.

0 Karma

woodcock
Esteemed Legend

I have now had this happen several times in NIX in the 7.* releases (shame on splunk for not doing regression/capacity testing with bounds-checking).

0 Karma

Richfez
SplunkTrust
SplunkTrust

If we get to the point where this has stabilized, I think it would behoove us to specify a reasonable range of releases where this was a problem (IIRC, 7.2.0 throught 7.2.3?). Just so we know it's not an ongoing problem.

0 Karma

Richfez
SplunkTrust
SplunkTrust

All but one - I ran a single SH/Indexer box on Windows for years, from version 4.3 to - Oh, I may have even skipped 5.x entirely! - 6.0. I had no significant problems (possibly really none at all - I can't remember in that much detail!, but certainly nothing serious).

0 Karma

woodcock
Esteemed Legend

True, I do know you, @rich7177!

0 Karma

Richfez
SplunkTrust
SplunkTrust

Yes. 🙂

Note I'm STILL not recommending Splunk on Windows* , just saying I had no problems in several years of running Splunk on Windows.

  • When might I suggest windows?

Small shop running Splunk Free with no use for more than 10 or 20 GB/day of license because they simply don't have that much stuff going on, then a Windows all-in-one box would probably be fine.

Small shops that have no Linux experience. Again, with a maybe up to 50 GB/day limit and no replication requirements.

Places with no real IT people, just a guy in the place that can take care of the few day to day things....

Wait, I see the common thread - Very small places (data-wise) with little to no Linux experience.

0 Karma

woodcock
Esteemed Legend

It is still a Windows "best practice" to have a monthly reboot (if not more frequently). I have seen Linux indexers that have an uptime of YEARS. Who can afford monthly Indexer downtime just so that the host OS doesn't crash?

martin_mueller
SplunkTrust
SplunkTrust

A rolling reboot in a cluster shouldn't pose a big issue.

Richfez
SplunkTrust
SplunkTrust

I'm not disagreeing this was a problem ages ago nor am I in any way suggesting running Splunk on Windows, but I think this is a problem long past now.

I can say without any reservation that you can get years of up-time on Server 2008 and newer easily. Though obviously if you patch - which applies to both Linux and Windows - you'll be rebooting them at least occasionally.

woodcock
Esteemed Legend

To be fair, though, THP was a huge disaster that was (is) a big black-eye for Splunk on *NIX.

0 Karma

woodcock
Esteemed Legend

True, but in most cases, *NIX can be patched/upgraded without a reboot.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!