Getting Data In

Why are we missing Windows "WinEventLog:Security" event logs?

dionrivera
Path Finder

I have 40 Windows 2012 domain controllers (forwarding through heavy forwarders to cloud), that intermittently stop sending  "WinEventLog:Security" events to cloud indexers. In some cases, one of the servers will send Security events for a few hours and then stop sending altogether. I know the events exist on the server because I can see them through Event Viewer. On the other hand, I don't have the same issue with the Application or System events. They flow all the time. The issue only happens with "WinEventLog:Security" events.
So far, I have tried to split the load among 4 heavy forwarders, thinking it was a forwarder congestion issue. I also configured the domain controllers to send directly cloud, bypassing the heavy forwarders. Alas, no success. 
Has anyone experienced or heard about this issue? Thank you.

Labels (2)
0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @dionrivera,

The reason I mentioned delayed is you are having problem only on WinEventLog:Security events.  Since the rest are flowing fine it can not be congestion or thruput problem. 

If you run your above search again do you see increase on values? If yes there is delay , if not they are stopped. 

I think it is better for you to create a support case.

If this reply helps you an upvote and "Accept as Solution" is appreciated.
0 Karma

dionrivera
Path Finder

@scelikok Unfortunately running the search again does not increase the values. I opened a ticket a few months ago on this issue and they recommended the changes below with no success. Their final recommendation, was to reboot the Windows servers once a week or upgrade from 2012 R2 to 2019 or newer. I will re-open the case and request more help.

 

1. Change evt_resolve_ad_obj = 1 (change to 0)
2. Increase the number of pipelines to handle incoming data. Number of cpus on host minus 1. In my case I have 9 pipelines.

3. Modify outputs.conf [tcpout] stanza to:

[tcpout]
autoLBFrequency = 180
forceTimebasedAutoLB = false
autoLBVolume = 5000000
maxQueueSize =25MB

0 Karma

scelikok
SplunkTrust
SplunkTrust

You must restart UF service on those servers.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

dionrivera
Path Finder

@scelikok I'm still seeing the same issue on most hosts as you can see below. You mentioned that the events are delayed and not dropped. Is that a good assumption? Also, I'm sharing my query in case this would be helpful.

I should mention that in addition to making these changes, we spun up 3 additional HFs thinking it was a congestion issue. But, we are seeing the same behavior across those HFs as well. Your help is appreciated.

index=windows_ad source="WinEventLog:Security" host IN (host1 host2 host3) | timechart count by host span=1h limit=50

 

dionrivera_1-1676574905882.png

 

0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @dionrivera,

In your config, there is a current_only setting twice which is 1 actual. This may cause missing events when your restart the forwarder service or host. Please keep this as current_only=0.

Please try below setting (cache settings)

[WinEventLog://Security]
use_old_eventlog_api = true
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
blacklist1 = EventCode="4662" Message="Object Type:(?!\s*groupPolicyContainer)"
blacklist2 = EventCode="566" Message="Object Type:(?!\s*groupPolicyContainer)"
renderXml = true
index = my_windows_ad
evt_ad_cache_exp = 1200
evt_ad_cache_exp_neg = 1200
evt_ad_cache_max_entries = 40000
evt_sid_cache_exp = 300
evt_sid_cache_exp_neg = 300
evt_sid_cache_max_entries = 4000
evt_dc_name = localhost

 

If you still have a delay you may have another problem. It is better to open a support case. 

 

If this reply helps you an upvote and "Accept as Solution" is appreciated.

dionrivera
Path Finder

@scelikok Should I bounce the UF on the host servers for these changes to take effect?

0 Karma

scelikok
SplunkTrust
SplunkTrust

evt_resolve_ad_obj = 0 will stop SID resolution. You will not able to see usernames in the logs. 

Please test only use_old_eventlog_api = true

If this reply helps you an upvote and "Accept as Solution" is appreciated.
0 Karma

dionrivera
Path Finder

@scelikok  changed evt_resolve_ad_obj  back to 1.

Since changing use_old_eventlog_api  to true. I still see the logs delayed/missing. I am including my stanza for this source. Let me know if you see anything that can be improved. I'm surprised this isn't a bigger deal with Splunk. I haven't seen any know bug articles or bulletins for this issue. Thank you.

[WinEventLog://Security]
use_old_eventlog_api = true
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
blacklist1 = EventCode="4662" Message="Object Type:(?!\s*groupPolicyContainer)"
blacklist2 = EventCode="566" Message="Object Type:(?!\s*groupPolicyContainer)"
renderXml = true
index = my_windows_ad
current_only = 1

0 Karma

dionrivera
Path Finder

@scelikok Thanks for the suggestion. I added your fix this morning as well as

evt_resolve_ad_obj = 1 (change to 0)

Suggested by another splunker. I'll check back tomorrow. Out of curiosity, does a 40 domain controller environment seem too large. Any other ideas how to limit the traffic from this source?

0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @dionrivera,

Splunk Universal Frowarder resolves SID to username for WinEventLog:Security logs by querying the nearest DC. If your DCs are busy, this resolution takes more time and causes delays. If you check the logs they should be coming but are delayed. If this is the case you can try adding below parameter to use old event log API for resolution.

[WinEventLog://Security]
use_old_eventlog_api = true

 

If this reply helps you an upvote and "Accept as Solution" is appreciated.

dionrivera
Path Finder

@scelikok  Wanted to update you on the resolution. As it turns out editing the limits.conf file directly in the app solved my issue. Initially, it was set to the default maxKBps=256. I set it to 0 using the settings below. This seemed to solve the issue and now I'm receiving all my events. The new setting in the limits.conf file is

[thruput]

maxKBps = 0

Thanks for all your help!

0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @dionrivera,

My confusion is why only Security events are affected. The first thing to check should have been the "thruput" setting but since your system events were working alright we didn't consider that option.

Anyway, very nice to hear it is resolved.

 

If this reply helps you an upvote and "Accept as Solution" is appreciated.
0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...