Getting Data In

monitoring files - how does splunk count the size?

Vladimir
Path Finder

Hi,

I've configured a directory for monitoring in inputs.conf ([monitor://path_to_dir]) and separated index for this folder several days ago. Everything is ok except one thing... the total size of files is ~500 Mb but splunk shows (in index activity->index volume) that it indexing ~800 Mb per hour ... how is it possible? There is 10 Mb of new logs/day only. Does splunk resend the whole file if it has been changed (even if added 1 row)?
The total amount of events is ~800-900 per 1 hour. My rsyslog index with ~12-15 000 events/h is increased ~100 Mb/h only.

The same situation I have for one more monitored folder.
splunk v4.3,

Tags (1)
0 Karma
1 Solution

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

View solution in original post

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

Lamar
Splunk Employee
Splunk Employee

Keep in mind that your whitelist/blacklist needs to be in regex form. So, you would want:


whitelist = \.log$
blacklist = \.zip$

This should work a bit better for what you're trying to accomplish.

0 Karma

Vladimir
Path Finder

hm.. it doesn't work
I can still see in _internal index splunk is polling the data from archive. Current configuration is:

followTail = 1
recursive = false
disabled = 0
whitelist = *.log
blacklist = *.zip #tried to exclude somehow zip files 🙂

0 Karma

Lamar
Splunk Employee
Splunk Employee

It would in fact.

0 Karma

Vladimir
Path Finder

will recursive = false help in this case?

0 Karma

Vladimir
Path Finder

there is no any subfolders but I figured out there are several archive files (*.zip with old files) and looks like (in metrics.log) splunk unzipped it and indexed... arrrhh

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...