Getting Data In

What are the pain points with deploying your Splunk architecture on Windows OS?

woodcock
Esteemed Legend

I am growing very tired of being asked to justify my "undocumented" and "bigoted" best-practice of NEVER deploying splunk infrastructure (Search Heads, Indexers, License Manager, Cluster Master, Deployer, Deployment Server, Monitoring Console, etc.) on any Windows OS. I am sure many of you have faced the same challenge. I have created this question so that we can create a canonical list from which we can all share the same URL where the best and brightest of us can share our past pain with the kind intention of helping others avoid the windows path of perfectly-avoidable regret. If you think that you will use this Q&A as a reference point, then please do me-too the question. If you have just cause to avoid Windows then P*L*E*A*S*E post your answer. Remember, friends don't let friends deploy on Windows: let's give them the facts that they need to successfully push back. Please include links to documented disasters when possible. Keep in mind that I probably will never accept any answer to this question (to encourage others to participate in perpetuity). Let's do one objection per answer and vote on the best objections so that the most-important ones will filter to the top.
ATTENTION!!! ATTENTION!!!!
THERE ARE NOW MORE ANSWERS THAN FIT ON A SINGLE PAGE (NOTE PAGINATION CONTROLS AT THE BOTTOM)!

PickleRick
Ultra Champion

A typical way of restarting splunkd.exe (by going into Services manager and restarting the service instead of doing "splunk restart" - this way you don't have to bother with Splunk admin accounts) often ends up with "the service failed to stop" message due to way too short timeouts set by default for services. The Linux equivalent of "service splunk restart" or "systemctl restart splunk" don't typically do that.

0 Karma

gcusello
Esteemed Legend

Hi @woodcock,

I' am completely agree with you: I'm working on Splunk from elevan years and I saw only one production system using Windows that was not so large (40 GB/day and less of 40 clients), and anyway, also they migrated to Linux!

I found many problems or more resources requested for Windows systems (e.g. completely unuseful graphical interface) and especially many problems on Deployment Server that is unable to deploy apps to Linux servers.

Windows can be used (maybe) for test.

Ciao at all.

Giuseppe

0 Karma

kearaspoor
Communicator

I don't have links but from personal experience, the Windows OS has a higher overhead so all the sizing recommendations are undersized compared to real-world behavior. 

In our specific scenario, measuring CPU/Memory utilization in the late days of v5 (pre-clustering) we saved nearly half of available resources by moving from Windows to Linux (being able to cut out roughly half of our hardware and still seeing better performance).  There have been vast improvements in how Splunk uses available resources since then so I suspect the differences won't be so egregious using anything in the 7.x, 8.x or 9.x versions but I also would be shocked if Windows ever performs better than *nix given so much of Splunk development start with *nix as the assumed OS.

0 Karma

Jason
Motivator

Some versions of windows (which?) may run out of network ports with default settings.

https://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/Workaroundfornetworkaccessibilityis...

0 Karma

Jason
Motivator

wow, page 4!

0 Karma

MuS
SplunkTrust
SplunkTrust

Actually there is ONE good thing about running Splunk on Windows!

When you are running Splunk 7.2.2 and later you don't have to worry about the systemd madness 😉

cheers, MuS

dnitschke_splun
Splunk Employee
Splunk Employee

According to the documentation, Workload Management is only supported on Linux operating system. https://docs.splunk.com/Documentation/Splunk/latest/Workloads/Requirements

woodcock
Esteemed Legend

This is the feature that allows you to keep bad users/searches from impacting other users/searches:

https://docs.splunk.com/Documentation/Splunk/latest/Workloads/Aboutworkloadmanagement

0 Karma

MuS
SplunkTrust
SplunkTrust

Default behaviour on Windows Event logs and how they are rolled over will result in event loss!

Confirmed by multiple customers and Splunk support (Case# 1214731), answer from Support:

Basically when the log size has reached full, the O/S overwrite the file and that does not give us enough time to complete the reading the event.
Thus, they recommend to set up "Overwrite events as needed(oldest events first)" instead.
I am not sure if it's possible but if the O/S can save the events to any other folder just before archiving, we can monitor the folder instead.
However, when I see the EventLog properties, the feature does not exist.
Please use "Overwrite events as needed(oldest events first)" to avoid the missing event issue.

cheers, MuS

0 Karma

woodcock
Esteemed Legend

This will result in blind spots when monitoring your infrastructure. Obviously for non-infrastructure Windows hosts, it is what it is.

0 Karma

michael_schmidt
Path Finder

DO NOT install Splunk to the default windows File Path i.e. C:\Program Files\Splunk. You'll find that Splunk can't create certain temp files (hashed search temp directories and files to be specific) due to the fact that the file path violates the Windows 260 character file path limitation. What'll end up happening is that Splunk will sucking up all the RAM available until SplunkD finally completely crashes, since it has nowhere to store stuff temporarily.

Amazingly, the longest file path I've ever seen Splunk try to create unsuccessfully is 264 characters. Really? 4 Characters? Although I suppose that it could be longer or shorter depending on the name of the Search head referenced in the path. You Splunk coder guys couldn't have found somewhere to shorten up this file path or restrict it to less than the maximum number of characters allowed by the OS?

For example:

*C:\Program Files\Splunk\var\run\splunk\dispatch\remote_SearchHeadName_scheduler__admin_c3BsdW5rX2FwcF93aW5kb3dzX2luZnJhc3RydWN0dXJl__RMD5e93ff07c552f3ee0_at_1477516800_3187_F5AAE4E2-7A34-4327-8CDA-83913FB48502\index_buckets.csv.647C07D6-2813-4D98-AD2E-ED1FCACEB554.tmp*

I mean, you've already got multiple Hashes going on here, can't you just remove the index_buckets.csv. part and save like 17 characters?

Just goes back to the whole issue of Windows Installs being treated like the second class installs.

Yes, supposedly this restriction is removable in Windows 10, and hopefully that means the same is true for Server 2016, but I haven't found any documentation to definitively state that one way or another.

Also given that there's a requirement to enable it via either registry key, or Group Policy, and that there are other caveats, I don't fully trust Windows to support this functionality, nor Splunk's ability to access it reliably.

Trust me, you're better off avoiding the whole issue entirely, and just installing to C:\Splunk or D:\Splunk or whatever drive letter you'd prefer. Better yet, Go NIX.

woodcock
Esteemed Legend

According to this documentation, the ES Content Update app only supports Linux:

http://docs.splunk.com/Documentation/ESSOC/latest/user/Install

0 Karma

woodcock
Esteemed Legend

NOTE: In the very latest releases, this is no longer the case (Windows is listed as supported).

0 Karma

woodcock
Esteemed Legend

You have to be careful what editor you use to create configuration files. Whenever I used notepad, the newlines get messed up but there is really no way to tell. When you open the file and look at it, it looks file, but Splunk will (sometimes silently) not understand the lines. I constantly have this problem with Windows and never have any problems like this with NIX. The safest thing to do is to *ONLY use notepad++ or a linuxy tool like MobaXTerm that allows vi, etc. to edit .conf files.

lcrielaa
Communicator

Splunk ES on SHC's is not supported on Windows but is supported on Linux.
http://docs.splunk.com/Documentation/ES/5.0.0/Install/DeploymentPlanning#Splunk_Enterprise_Security_...

0 Karma

woodcock
Esteemed Legend

Splunk's SIEM offering Enterprise Security (or ES for short) is NOT supported for Search Head Clustering on Windows (but is for *NIX). The dox here: https://docs.splunk.com/Documentation/ES/latest/Install/DeploymentPlanning#Splunk_Enterprise_Securit... say Splunk Enterprise Security supports installation on Linux-based search head clusters only. At this time, Windows search head clusters are not supported by Splunk Enterprise Security.

0 Karma

niketn
Legend

It seems an App created in Windows can not become Splunk Certified since it fails access 644 test in Splunk App inspector. This is a prerequisite for Splunk certified apps.

Refer to my question: https://answers.splunk.com/answers/607533/app-certification-criteria-i-need-help-on-the-644-1.html

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

davidpaper
Contributor

When troubleshooting performance problems in windows, the MC is incredibly useful, specifically looking at the Resource Usage: Machine and digging down into the Disk subsystem metrics. Sometimes the problem is really obvious (service times in the 00s of milliseconds, or the wait time higher than 20-30ms). Sometimes it isn't. Unfortunately, we don't yet collect disk queue length for reads/writes (SPL-147262 has been filed to do this in _introspection data and present in Monitoring Console).

Windows Performance Monitor application can display this info, if you know where to look. In this case, Performance -> Monitoring Tools -> Performance Monitor, and select PhysicalDisk as the object under System. Avg Disk Queue Length, Avg Disks Read Queue Length, and Avg Disk Write Queue Length are the three items we looked at for each of the drives (C: and E: in this example). It became very obvious that seeing disks queues for reads and writes on a system that was not very busy and had SSD for C: and effectively no IO going to E:, we had a problem that wasn't specifically Splunk related, but OS/storage related.

0 Karma

niketn
Legend

Refer to issue with the following question. Use of tostring() function to convert 64 bit integer to Hexadecimal works fine on Linux 64 bit system, but fails on Windows 64 bit system.

As stated the workaround is to use printf() to perform this conversion of 64 bit integer to hexadecimal, which lends more options (control) for conversion and works fine on Windows system as well 🙂

https://answers.splunk.com/answers/550028/how-to-convert-64-bit-number-from-decimal-to-hex.html?chil...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

woodcock
Esteemed Legend

Windows routinely has problems with long file names and long paths. Almost every release of Splunk has listed bugs ONLY for windows on this. It is so bad and commonplace that Splunk actually publishes a section JUST FOR WINDOWS BUGS in every release notes now:

https://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/KnownIssues#Windows-specific_issues

Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...