Is anyone else experiencing any kind of performance hit after installing the Splunk Add-on for Microsoft Cloud Services? We installed the add-on in our Development environment (single server setup) and Processor Utilization (2 vCPU) almost immediately jumps to 100% and stays there? If we disable the app, CPU seems to normalize. We've not really even been able to get to the point of configuring the app...just curious if anyone else seeing similar results?
We are seeing high write quantities (over 2gb) that don't equate to our source input. We are on a 1TB drive (5000 IOPS), 8 core, 16gb ram F8Sv2. The blob storage container we have as input may add 5-8 thousand files over a weekend, but these are tiny log files -- 10k each. I fear, as suggested above, that it is looking at all files and appending the deltas / new files. Any suggestions on how we can mitigate further? (previous upgrades to vm has had no affect and MS has assured us all is well on their disk backend).
The performance issue appears to be due to the number of Python scripts that are initiated by the custom modular inputs and the frequency that these are scheduled. When you first install, there are so many python scripts running that the CPU jumps to 100% and barely drops down. This is even with no inputs defined from my side.
The two changes I have made to address the issues are as follows
1) I don't use the Office365 inputs, so I have renamed these python scripts under the /bin folder, ms_0365_account_monitoring.py, ms_0365_management.py, ms_0365_ucc_server.py
2) Changed the "interval" value specified in the default/inputs.conf file for all inputs. Even though these inputs are all set to disabled. the interval seems to be an internal setting to decide how often the python script is initiated. In some cases such as tables/blobs, this causes the python script to run every 10 seconds.
After these two changes, the performance of the server returned to a normal level.
The app comes with modular inputs to do the collection of the microsoft azure data. (check at ps -aux | grep splunk, you should see the modules running and get more details.)
It may be very busy, so you may want to use a dedicated instance for that, to not overload an existing search-head.
Correct. It's on a dedicated instance, but the lack of reaping of the massive amounts of modinputs files is killing a very large, physical server. ( 40core/96Gb )
I see the exact same situation in a total greenfield scenario.
on an AWS C4.XLarge instance (4cores, 7,5gb ram) with just Splunk 7.0.2 and Version 2.1.0 of the Splunk Add-on for Microsoft Cloud Services I max out all 4 cores.
I have done 0 configuration. just installed Splunk, Installed the add-on and restarted Splunk after the add-on.
Upping the instance to a 2xlarge with 8 cores and 15gb of ram, it's starting to become "usable"
In comparison to DBConnect I have running on a C4.Large (2 cores, 3,75gb ram) this is ridiculous!
I'm having the same issue.
The moment I disabled the add-on, CPU usage comes back to normal. Once I enabled it, it jumps straight to 100% and stays there.
Never even started to configure anything yet having this CPU usage issue.
Could anyone shed some light on this?
Wondering if you're running into the same issue that I've had with the fact that this TA seems to generate thousands of files under $SPLUNK_HOME/var/lib/splunk/modinputs/mscs_storage_blob and never reap them. I noticed performance getting progressively worse ... Eventually cleared out the directories underneath and restarted, which resolved the performance issue, but then we started collecting blogs that we'd already gotten before.
Apologies if this has nothing at all to do with your issue.
I’m having the same problems. The delta mechanism requires a ton of checkpoint files you’re speaking of. I’ve had to write my own code for table storage but it isn’t very performant either. When pulling IIS logs from PaaS devices, it can barely keep up. I worked around by only writing check points once for every 100 events. Unfortunately that’s not something you can configure in any Splunk app I’ve seen. I did it in my own code.
Experiencing same issues, also on a 2 vCPU server and no configuration after initial install. I realize that 2 vCPU may not be adequate for production, but i'm just trying to test and the 95+% CPU utilization does not give me the "warm fuzzies" about this app. I'm using version 2.0.2 of the app and version 6.5.1 of Splunk Enterprise.
All, Which version of the app please?
Running on 6.6.0, MSCS 2.1.0.
Running on Azure and not exactly overpowered as this is a Dev environment, but having this installed took both the searchhead and indexer from systems that spike to 100% (running ES 4.7.4) to systems that were pegged at 100% even after I disabled the two inputs I had created. I also note the absence of a disable, so it's not easy to quantify the impact as you have to shut down splunk and move the app out and restart splunk. The Platform Requirements section of the documentation seems a little underbaked given what we're seeing. Is there anything a little deeper in terms of system impact in terms of blob storage/table storage volumes etc?
I was using the 6.3.3 version of Splunk with the Cloud Services v2.0.2 addon. After noticing that the CPU usage spiking terribly, I upgraded Splunk to the latest version - yet still the same.
splunkd.exe and python.exe seems to be spawning all over the place.
As of now, the Cloud Services add-on remains disabled.
-- Using a VM (vmWare) with 8GB RAM. Only PingFederate addon was enabled.
Hello
We have encountered some problem with the add-on when pulling blob and a lot of blob in our container.
The "blob_mode" parameter takes only the value "append" so Splunk always try to pull all blob even if there's no modification after creation. The one-shot indexer seems not available.
I do. CPU stuck at 100%. No idea why so far.
You might have an disk IO bottleneck which may be causing this. Try using premium disks or make a 4 disk spool at least.
thank you. This particular issue has only to do with poor housekeeping on the side of the Add on. There appears to be no solution for it. Having to deal with thousands of new files every hour without any consideration for reaping what's no longer relevant is, in my opinion, a poorly architected app.