Getting Data In

Monitoring many files over CIFS

PickleRick
SplunkTrust
SplunkTrust

At the beginning I want to say that I did search the forums and I saw the most typical responses like "use logrotate". Sorry, that's not applicable in this case.

And the case is:

I have a windows-based UF which is supposed to ingest files from several servers shared over CIFS shares.

During typical opration it seems to work quite well since we increased the limits on UF, especially those pertaining to number of open file descriptors.

But we have issues whenever we have to restart the UF (which mostly happens when we add new sources).

Then the files get re-checked one after another and even though we have limits for "ignore_older_than" and so on, and the crc's are saled with the filename so effectively the events aren't re-ingested multiple times, UF opens each file one after another and checks its contents and it takes up to few hours after each UF restart which is kinda annoying.

Any hints at optimizing that?

Unfortunately, we're quite limited on the source side since we're not ablle to either install the UF's at the log-producing machines (which of course would at least to some extend alleviate the problem) or move the logs away from the source location. So effectively we're stuck with this setup.

Might there be some issue with windows sharing so that even though the fishbucket seems to be working properly, the UF still has to scan through whole file from the beginning? That would explain the delay.

Labels (4)
0 Karma

Vardhan
Contributor

The config only helps to process multiple events at a time and send logs in a real time without much delay. If any use cases are configured to monitor those events . Then you can monitor the logs in real time.

PickleRick
SplunkTrust
SplunkTrust

Yeah, that's pretty much what I thought. It won't help much with the necessity of - for example - reading one huge log first before reading other logs. I'd have to have two separate UF instances on one host for that and only monitor some files on one of them and some on the other. OK. For now I'll just have to live with what I have 🙂

Thx for help.

Tags (1)
0 Karma

Vardhan
Contributor

Hi @PickleRick ,

In order to process the events faster you can try increasing the pipeline in the UF. But this will consume more resources from the UF server end.

cd %SPLUNK_HOME%\etc\system\local
[general] parallelIngestionPipelines = 2

https://docs.splunk.com/Documentation/Forwarder/8.2.3/Forwarder/Configureaforwardertohandlemultiplep...

 

 

PickleRick
SplunkTrust
SplunkTrust

Yeah, thought about that. Definitelly helps with processing events on HF's. Should it also improve monitoring-related stuff?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...