I have a Splunk universal forwarder on a client machine. I have a deployed app that looks like this..
[monitor:///export/home/storeadm/r*]
disabled = true
followTail = 0
index = contentkeeper
source = contentkeeper_passed
sourcetype = contentkeeper_passed
whitelist = (/r.*\.csv.gz$|/r.*\.csv$)
On the indexer there is a corresponding props.conf entry
[contentkeeper_passed]
REPORT-ckpassed = ckpassed_extractions
And a corresponding transforms.conf entry
[ckpassed_extractions]
DELIMS=","
FIELDS="Time","Category","IP-Address","Username","Bytes","Status","Content-Type","Url","Policy","Category-Description"
The data files are all compressed (.csv.gz) so the second whitelist match is superfluous. There are a few months of data sitting in that directory.
The volume of data is quite small (only 10s of MB per day). PS: sorry about the timestamps. I touched the files as a test, but usually the files have an incrementing daily timestamp.
-rw-r--r-- 1 storeadm storeadm 1.3M Mar 15 15:41 r29-12-2011.csv.gz
-rw-r--r-- 1 storeadm storeadm 38M Mar 15 15:41 r30-01-2012.csv.gz
-rw-r--r-- 1 storeadm storeadm 2.5M Mar 15 15:41 r30-10-2011.csv.gz
-rw-r--r-- 1 storeadm storeadm 44M Mar 15 15:41 r30-11-2011.csv.gz
-rw-r--r-- 1 storeadm storeadm 781K Mar 15 15:41 r30-12-2011.csv.gz
However my license quota is often exceeded, typically more than 20GB (that's GIGABYTES!) per day. I don't think it's the months of data that's the problem. The entire directory is only 3.6GB.
$ du -sh /export/home/storeadm/
3.6G /export/home/storeadm
I think the problem is Splunk is re-indexing the same files.
$ grep "reading path" splunkd.log | awk '{print $8}' | sort | uniq -c
...
4 path=/export/home/storeadm/r30-10-2011.csv.gz
2 path=/export/home/storeadm/r30-11-2011.csv.gz
6 path=/export/home/storeadm/r30-12-2011.csv.gz
2 path=/export/home/storeadm/r31-10-2011.csv.gz
2 path=/export/home/storeadm/r31-12-2011.csv.gz
These are the kinds of entries I'm grepping over.
03-17-2012 06:42:20.258 +1100 INFO ArchiveProcessor - handling file=/export/home/storeadm/r09-03-2012.csv.gz
03-17-2012 06:42:20.295 +1100 INFO ArchiveProcessor - reading path=/export/home/storeadm/r09-03-2012.csv.gz (seek=0 len=50548338)
03-17-2012 07:28:09.552 +1100 INFO ArchiveProcessor - Finished processing file '/export/home/storeadm/r09-03-2012.csv.gz', removing from stats
What should I do to check whether Splunk is re-indexing the same files, contributing to my license problem? Is there some search I can run over the metrics index?
... View more