Getting Data In

Monitoring Exchange logs over CIFS

PickleRick
SplunkTrust
SplunkTrust

Hello there.

I'm having a performance problem. I have a "central UF" which is supposed to ingest MessageTracking logs from several Exchange servers. As you can guess from the "several Exchage servers" part, the logs are shared over CIFS shares (the hosts are in the same domain; to make things more complicated to debug, only the service account the UF runs with has access to those shares but my administrator account doesn't :-)).

Anyway, since there are several Exchange instances and each of the directories has quite a lot of files the UF sometimes gets "clogged" and - especially after restart - needs a lot of time to check all the logfiles, decide that it doesn't need to ingest most of them and start forwarding real data. To make things more annoying, since the monitor inputs are the same that are responsible for ingesting forwarder's own logs, until this process completes I don't even have _internal entries from this host and have to check the physical log files on the forwarder machine to do any debugging or monitoring.

The windows events, on the other hand, get forwarded right from the forwarder restart.

So I'm wondering whether I can do anything to improve the efficiency of this ingestion process.

I know that the "officailly recommended" way would be to install forwarders on each of the Exchange servers and ingest the files straight from there but due to organizational limitations that's out of the question (at least at the moment). So I'm stuck with just this one UF.

I already raised thruput, but judging from the metrics.log it's not an issue of output throttling and queue blocking.

I raised ingestion pipelines to 2 and my descriptors limit is set at 2000 at the moment.

The typical single directory monitor input definition looks something like this:

[monitor://\\host1\mtrack$\]
disabled = 0
whitelist = \.LOG$
host = host1
sourcetype = MSExchange:2013:MessageTracking
index = exchange
ignoreOlderThan = 3d
_meta=site::site1

 And I have around 14, maybe 16 of those to monitor. Which means that when I do splunk list inputstatus I'm getting around 500k files (most of them get ignored but they have to be checked first for modification time and possibly for CRC)!

I think I will have to tell the customer that it's simply beyond the performance limits of any machine (especially when doing all this file stating over the network) but I was wondering if there are any tweaks I could apply even in this situation.

Labels (2)
0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...