Getting Data In

monitoring files - how does splunk count the size?

Vladimir
Path Finder

Hi,

I've configured a directory for monitoring in inputs.conf ([monitor://path_to_dir]) and separated index for this folder several days ago. Everything is ok except one thing... the total size of files is ~500 Mb but splunk shows (in index activity->index volume) that it indexing ~800 Mb per hour ... how is it possible? There is 10 Mb of new logs/day only. Does splunk resend the whole file if it has been changed (even if added 1 row)?
The total amount of events is ~800-900 per 1 hour. My rsyslog index with ~12-15 000 events/h is increased ~100 Mb/h only.

The same situation I have for one more monitored folder.
splunk v4.3,

Tags (1)
0 Karma
1 Solution

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

View solution in original post

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

Lamar
Splunk Employee
Splunk Employee

Keep in mind that your whitelist/blacklist needs to be in regex form. So, you would want:


whitelist = \.log$
blacklist = \.zip$

This should work a bit better for what you're trying to accomplish.

0 Karma

Vladimir
Path Finder

hm.. it doesn't work
I can still see in _internal index splunk is polling the data from archive. Current configuration is:

followTail = 1
recursive = false
disabled = 0
whitelist = *.log
blacklist = *.zip #tried to exclude somehow zip files 🙂

0 Karma

Lamar
Splunk Employee
Splunk Employee

It would in fact.

0 Karma

Vladimir
Path Finder

will recursive = false help in this case?

0 Karma

Vladimir
Path Finder

there is no any subfolders but I figured out there are several archive files (*.zip with old files) and looks like (in metrics.log) splunk unzipped it and indexed... arrrhh

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...