Getting Data In

monitoring files - how does splunk count the size?

Vladimir
Path Finder

Hi,

I've configured a directory for monitoring in inputs.conf ([monitor://path_to_dir]) and separated index for this folder several days ago. Everything is ok except one thing... the total size of files is ~500 Mb but splunk shows (in index activity->index volume) that it indexing ~800 Mb per hour ... how is it possible? There is 10 Mb of new logs/day only. Does splunk resend the whole file if it has been changed (even if added 1 row)?
The total amount of events is ~800-900 per 1 hour. My rsyslog index with ~12-15 000 events/h is increased ~100 Mb/h only.

The same situation I have for one more monitored folder.
splunk v4.3,

Tags (1)
0 Karma
1 Solution

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

View solution in original post

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

Lamar
Splunk Employee
Splunk Employee

Keep in mind that your whitelist/blacklist needs to be in regex form. So, you would want:


whitelist = \.log$
blacklist = \.zip$

This should work a bit better for what you're trying to accomplish.

0 Karma

Vladimir
Path Finder

hm.. it doesn't work
I can still see in _internal index splunk is polling the data from archive. Current configuration is:

followTail = 1
recursive = false
disabled = 0
whitelist = *.log
blacklist = *.zip #tried to exclude somehow zip files 🙂

0 Karma

Lamar
Splunk Employee
Splunk Employee

It would in fact.

0 Karma

Vladimir
Path Finder

will recursive = false help in this case?

0 Karma

Vladimir
Path Finder

there is no any subfolders but I figured out there are several archive files (*.zip with old files) and looks like (in metrics.log) splunk unzipped it and indexed... arrrhh

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...