I am growing very tired of being asked to justify my "undocumented" and "bigoted" best-practice of NEVER deploying splunk infrastructure (Search Heads, Indexers, License Manager, Cluster Master, Deployer, Deployment Server, Monitoring Console, etc.) on any Windows OS. I am sure many of you have faced the same challenge. I have created this question so that we can create a canonical list from which we can all share the same URL where the best and brightest of us can share our past pain with the kind intention of helping others avoid the
windows path of perfectly-avoidable regret. If you think that you will use this Q&A as a reference point, then please do
me-too the question. If you have just cause to avoid Windows then P*L*E*A*S*E post your answer. Remember, friends don't let friends deploy on Windows: let's give them the facts that they need to successfully push back. Please include links to documented disasters when possible. Keep in mind that I probably will never
accept any answer to this question (to encourage others to participate in perpetuity). Let's do one objection per answer and vote on the best objections so that the most-important ones will filter to the top.
THERE ARE NOW MORE ANSWERS THAN FIT ON A SINGLE PAGE (NOTE
PAGINATION CONTROLS AT THE BOTTOM)!
A typical way of restarting splunkd.exe (by going into Services manager and restarting the service instead of doing "splunk restart" - this way you don't have to bother with Splunk admin accounts) often ends up with "the service failed to stop" message due to way too short timeouts set by default for services. The Linux equivalent of "service splunk restart" or "systemctl restart splunk" don't typically do that.
I' am completely agree with you: I'm working on Splunk from elevan years and I saw only one production system using Windows that was not so large (40 GB/day and less of 40 clients), and anyway, also they migrated to Linux!
I found many problems or more resources requested for Windows systems (e.g. completely unuseful graphical interface) and especially many problems on Deployment Server that is unable to deploy apps to Linux servers.
Windows can be used (maybe) for test.
Ciao at all.
I don't have links but from personal experience, the Windows OS has a higher overhead so all the sizing recommendations are undersized compared to real-world behavior.
In our specific scenario, measuring CPU/Memory utilization in the late days of v5 (pre-clustering) we saved nearly half of available resources by moving from Windows to Linux (being able to cut out roughly half of our hardware and still seeing better performance). There have been vast improvements in how Splunk uses available resources since then so I suspect the differences won't be so egregious using anything in the 7.x, 8.x or 9.x versions but I also would be shocked if Windows ever performs better than *nix given so much of Splunk development start with *nix as the assumed OS.
Default behaviour on Windows Event logs and how they are rolled over will result in event loss!
Confirmed by multiple customers and Splunk support (Case# 1214731), answer from Support:
Basically when the log size has reached full, the O/S overwrite the file and that does not give us enough time to complete the reading the event. Thus, they recommend to set up "Overwrite events as needed(oldest events first)" instead. I am not sure if it's possible but if the O/S can save the events to any other folder just before archiving, we can monitor the folder instead. However, when I see the EventLog properties, the feature does not exist. Please use "Overwrite events as needed(oldest events first)" to avoid the missing event issue.
DO NOT install Splunk to the default windows File Path i.e.
C:\Program Files\Splunk. You'll find that Splunk can't create certain temp files (hashed search temp directories and files to be specific) due to the fact that the file path violates the Windows 260 character file path limitation. What'll end up happening is that Splunk will sucking up all the RAM available until SplunkD finally completely crashes, since it has nowhere to store stuff temporarily.
Amazingly, the longest file path I've ever seen Splunk try to create unsuccessfully is
264 characters. Really? 4 Characters? Although I suppose that it could be longer or shorter depending on the name of the Search head referenced in the path. You Splunk coder guys couldn't have found somewhere to shorten up this file path or restrict it to less than the maximum number of characters allowed by the OS?
I mean, you've already got multiple Hashes going on here, can't you just remove the
index_buckets.csv. part and save like 17 characters?
Just goes back to the whole issue of Windows Installs being treated like the second class installs.
Yes, supposedly this restriction is removable in Windows 10, and hopefully that means the same is true for Server 2016, but I haven't found any documentation to definitively state that one way or another.
Also given that there's a requirement to enable it via either
registry key, or
Group Policy, and that there are other caveats, I don't fully trust Windows to support this functionality, nor Splunk's ability to access it reliably.
Trust me, you're better off avoiding the whole issue entirely, and just installing to
D:\Splunk or whatever drive letter you'd prefer. Better yet, Go NIX.
You have to be careful what editor you use to create configuration files. Whenever I used
newlines get messed up but there is really no way to tell. When you open the file and look at it, it looks file, but Splunk will (sometimes silently) not understand the lines. I constantly have this problem with Windows and never have any problems like this with NIX. The safest thing to do is to *ONLY use
notepad++ or a linuxy tool like
MobaXTerm that allows
vi, etc. to edit .conf files.
Splunk's SIEM offering
Enterprise Security (or
ES for short) is NOT supported for
Search Head Clustering on Windows (but is for *NIX). The dox here: https://docs.splunk.com/Documentation/ES/latest/Install/DeploymentPlanning#Splunk_Enterprise_Securit... say
Splunk Enterprise Security supports installation on Linux-based search head clusters only. At this time, Windows search head clusters are not supported by Splunk Enterprise Security.
It seems an App created in Windows can not become Splunk Certified since it fails access 644 test in Splunk App inspector. This is a prerequisite for Splunk certified apps.
When troubleshooting performance problems in windows, the MC is incredibly useful, specifically looking at the Resource Usage: Machine and digging down into the Disk subsystem metrics. Sometimes the problem is really obvious (service times in the 00s of milliseconds, or the wait time higher than 20-30ms). Sometimes it isn't. Unfortunately, we don't yet collect disk queue length for reads/writes (SPL-147262 has been filed to do this in _introspection data and present in Monitoring Console).
Windows Performance Monitor application can display this info, if you know where to look. In this case, Performance -> Monitoring Tools -> Performance Monitor, and select PhysicalDisk as the object under System. Avg Disk Queue Length, Avg Disks Read Queue Length, and Avg Disk Write Queue Length are the three items we looked at for each of the drives (C: and E: in this example). It became very obvious that seeing disks queues for reads and writes on a system that was not very busy and had SSD for C: and effectively no IO going to E:, we had a problem that wasn't specifically Splunk related, but OS/storage related.
Refer to issue with the following question. Use of
tostring() function to convert 64 bit integer to Hexadecimal works fine on Linux 64 bit system, but fails on Windows 64 bit system.
As stated the workaround is to use
printf() to perform this conversion of 64 bit integer to hexadecimal, which lends more options (control) for conversion and works fine on Windows system as well 🙂
Windows routinely has problems with long file names and long paths. Almost every release of Splunk has listed bugs ONLY for windows on this. It is so bad and commonplace that Splunk actually publishes a section JUST FOR WINDOWS BUGS in every release notes now: