Hi,
I've configured a directory for monitoring in inputs.conf ([monitor://path_to_dir]) and separated index for this folder several days ago. Everything is ok except one thing... the total size of files is ~500 Mb but splunk shows (in index activity->index volume) that it indexing ~800 Mb per hour ... how is it possible? There is 10 Mb of new logs/day only. Does splunk resend the whole file if it has been changed (even if added 1 row)?
The total amount of events is ~800-900 per 1 hour. My rsyslog index with ~12-15 000 events/h is increased ~100 Mb/h only.
The same situation I have for one more monitored folder.
splunk v4.3,
It shouldn't resend the whole file again. It should only send the parity of the file.
Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?
Additionally, are there files buried deep in that directory that might be causing your file size to blow up?
Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.
It shouldn't resend the whole file again. It should only send the parity of the file.
Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?
Additionally, are there files buried deep in that directory that might be causing your file size to blow up?
Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.
Keep in mind that your whitelist/blacklist needs to be in regex form. So, you would want:
whitelist = \.log$
blacklist = \.zip$
This should work a bit better for what you're trying to accomplish.
hm.. it doesn't work
I can still see in _internal index splunk is polling the data from archive. Current configuration is:
followTail = 1
recursive = false
disabled = 0
whitelist = *.log
blacklist = *.zip #tried to exclude somehow zip files 🙂
It would in fact.
will recursive = false help in this case?
there is no any subfolders but I figured out there are several archive files (*.zip with old files) and looks like (in metrics.log) splunk unzipped it and indexed... arrrhh