I'm trying to determine if a bottleneck exists in my environment. We ingest about 130Gb a day. Syslog events come through without delay, but Windows Events are delayed anywhere between 1,500 - 5,000 minutes.
A caveat is that our environment is hybrid. We host our indexers in Azure. We have an express route VPN set up and it seems to be artificially low when it comes to write speeds on our index cluster. The express route VPN is rated at 1Gbps.
The indexers drives are rated for up to 7500 iOPS. The Heavy Forwards are on-prem.
We have Windows Events going to 4 Heavy Forwarders (load balanced) then to the Index Cluster (Round Robbin)
Does this indexing rate seam reasonable ? It's never really gotten above 2Mbs.
The average indexing rate will always depend on your particular infrastructure. How many hosts/UF/HF sourcing data do you have? how many indexing pipelines do you have per indexer? There are alot of questions regarding this topic that probably won't solve you particular problem.
Does the syslog and windows events goes through the same HF to the IDX Layer?
Also what do you mean " 4 Heavy Forwarders (load balanced)"? do you have anything (3rd party) between windows UFs and this HFs?
Windows Events come from Windows Event Collectors with UF's installed.
Two indexing pipelines per indexer.
The syslog events are going also going to the Heavy Forwarders before being indexed.
Have you checked the WEC part? That might be your bottleneck since you're not having issues with syslog going through the same link.
We also enabled Parallel Ingestion on the UF's installed on the WEC servers.
We used these as guidelines.
HP Windows Event Forwarding.
Microsoft Best Practices.
Windows Event Forwarding Survival Guide.
Windows Event Forwarding for Network Defense.
Centralizing Windows Event Forwarding
End-Point Log Consolidation with Windows Event-Forwarder
WEF/WEC is far from the preferred method of collecting windows events. Its usually way better to have UFs everywhere sending data even using your heavies has intermediate forwarders before hitting azure idxs.
Also do you have the windows TA installed in you HF?
Are you hitting network bandwidth on your VPN route ? If yes then are you modifying actual raw events from Windows or syslog servers on Heavy Forwarders ? If you are not modifying actual raw events on Heavy Forwarders then I'll suggest you to use Universal Forwarders as intermediate forwarders instead of Heavy Forwarders, because Heavy Forwarders will parse data and add metadata and other stuffs with raw events which increase traffic on network.
Here is good blog post about Universal vs Heavy Forwarders https://www.splunk.com/blog/2016/12/12/universal-or-heavy-that-is-the-question.html
We are not hitting the bandwidth limitation of the VPN. Still have plenty of room to breath. We have cleaned up Windows events (inputs.conf), removed whitespaces, blacklisted events, etc...
We also tried making the WEC's heavy forwarders. Didn't seem to make much difference.
You see a delay on the windows events, from forwarders. Can you also check the internal logs of those forwarders, it will tell you if the problem is on the forwarding or now. (if internal logs are also delayed, then you have a bottleneck after the forwarder)
If only the Windows events are often delayed at the collection level, because the forwarder may be waiting for the AD to do the objects names resolution.