Getting Data In

Nightly install of Universal Forwarder, random regmon crashes

FloydATC
Explorer

We are using Citrix PVS to provision fresh XenApp servers every night, about 60 of them in total. A few dozen applications are then script-installed after boot, the UF is one of them.

This works well most of the time but almost every day one or two terminal servers experience problems right off the bat. The process "splunk-regmon.exe" crashes over and over, generating huge amounts of log messages in the local splunk.log file and the _internal index.

On on occasion, it also started spewing crash dmp files, quickly filling the disk with 11 gigs of them before the server retired itself.

INFO TailingProcessor - Ignoring file 'C:\Program Files\SplunkUniversalForwarder\var\log\splunk\C__Program Files_SplunkUniversalForwarder_bin_splunk-regmon_exe_crash-2015-02-27-07-55-53.dmp' due to: binary
ERROR ExecProcessor - message from ""C:\Program Files\SplunkUniversalForwarder\bin\splunk-regmon.exe"" splunk-regmon - run_regmon: Fail to run Registry Monitor: 'bad allocation'

We use the following command line to install Universal Forwarder:

splunkforwarder-6.1.3-220630-x64-release.msi AGREETOLICENSE=Yes RECEIVING_INDEXER="splunk.######:9997" DEPLOYMENT_SERVER="splunk.######:8089" LAUNCHSPLUNK=1

(Domain names redacted)

The terminal servers then receive apps from the deployment server, here are the configs:

[WinEventLog://Application]
disabled = 0 
index=windows
start_from = oldest
current_only = 1

[WinEventLog://Security]
disabled = 0 
index=windows
start_from = oldest
current_only = 1

[WinEventLog://System]
disabled = 0 
index=windows
start_from = oldest
current_only = 1

[WinEventLog://ForwardedEvents]
checkpointInterval = 5
current_only = 1
disabled = 0
start_from = oldest
index=windows

[WinEventLog://Setup] 
checkpointInterval = 5
current_only = 1
disabled = 0
start_from = oldest
index=windows

[admon://default]
disabled = 1
monitorSubtree = 1

[WinRegMon://default]
disabled = 1
hive = .*
proc = .*
type = rename|set|delete|create

[WinRegMon://hkcu_run]
disabled = 1
hive = \\REGISTRY\\USER\\.*\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\.*
proc = .*
type = set|create|delete|rename

[WinRegMon://hklm_run]
disabled = 1
hive = \\REGISTRY\\MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run\\.*
proc = .*
type = set|create|delete|rename



[monitor://C:\Progra~1\Tivoli\TSM\baclient\dsmerror.log]
disabled = false
sourcetype = dsmerror
index=tsm
current_only = 1

[monitor://C:\Progra~1\Tivoli\TSM\baclient\dsmsched.log]
disabled = false
sourcetype = dsmsched
index=tsm
current_only = 1



[WinEventLog:Microsoft-Windows-PrintService/Operational]
disabled = 0
index=windows
start_from = oldest
current_only = 1

[WinEventLog:Microsoft-Windows-PrintService/Admin]
disabled = 0
index=windows
start_from = oldest
current_only = 1

As you can clearly see from the configuration files deployed, we are not using the registry monitor at all. Is there any way we can completely stop it from being executed? Why is this happening on just one or two random identical terminal servers booted off the same image? Is this a known problem with the UF version we are using?

bryan_dady
Explorer

Thanks for the tip to check out uberagent-for-splunk, but could this use case also be addressed without installing UF at all?
Did you consider collecting from these XA nodes with a remote forwarder?
http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConsiderationsfordecidinghowtomonitorWindowsd...

0 Karma

FloydATC
Explorer

We did consider this, but the need to collect other log files means we still need a local agent. The UF would fit the bill perfectly if we could just iron out the wrinkles. It's working fine on 100+ other servers so we just need to figure out what's causing random issues during nightly unattended installs.

0 Karma

bsonposh
Communicator

I'd also take a look at https://helgeklein.com/uberagent-for-splunk/ it does much of what you are looking for and it doesn't require the UF.

FloydATC
Explorer

We considered the uber agent briefly until we saw the price tag.

0 Karma

helge
Builder

While uberAgent is certainly not free, it might enable you to reduce the total data volume compared to alternative solutions.

0 Karma

dart
Splunk Employee
Splunk Employee

You used to be able to disable it completely like so:

 # $SPLUNK_HOME/etc/system/local
[script://$SPLUNK_HOME\bin\scripts\splunk-regmon.path]
disabled = 1

But I've not got a Windows forwarder handy to test on.

0 Karma

FloydATC
Explorer

And now the problem is back. Around 7 AM, one of the terminal servers died of Disk Full Syndrome with massive amounts of coredumps. "splunk-regmon.exe" and "splunk-admon.exe" have crashed over and over all since the forwarder finished installing around 4 AM this morning.

All the others are working just fine. This is driving me crazy.

0 Karma

jconger
Splunk Employee
Splunk Employee

If you are using PVS, then the setting to disable regmon would have been wiped out on a reboot. I have 2 suggestions:

1) Use Splunk deployment server to distribute a package to the universal forwarder to disable regmon.
2) Make the universal forwarder part of you gold image already set up like you need on the image. (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Integrateauniversalforwarderontoasystemimag...)

0 Karma

FloydATC
Explorer

1) This is what we already do. The setting is deployed but it does not have the desired effect of preventing crashes that fills the disk with coredumps.
2) This may be an option in the future if we can trust the UF. As of now, we can't.

0 Karma

FloydATC
Explorer

Seems to do the trick, over the weekend I have not seen one instance of splunk-regmon.exe crashing and I don't see it in the process list anymore.

Do you know a similar way to disable splunk-netmon.exe? This process also seems to crash randomly (although much less often)

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...