Hi,
I took 6 log files. The sum of events from all the log files is 10666.
I added the log files into my forwarder node.
When i checked the index: index=my_raw_index
The data showed was 21332. Double of actual count.
When i checked the source, there are 12 sources instead of 6. Some source types are fileparts of actual log file. Its like mylog.log-20130514.filepart
If i run a query: index=my_raw_index | where like(source, "%20130514"),
it gives me 10666.
Since the file size is huge, it took sometime for the log files to get copied completely.
I have following settings in inputs.conf file of forwarder node.
[monitor:///home/data/aaa/bbb/*]
disabled = false
sourcetype = bbb_ccc
index = my_raw_index
crcSalt = SOURCE
Note: For crcSalt, angular brackets are there. Here i took it out since nothing was getting displayed.
How to avoid this filepart indexing. What settings should be enabled so that data is not indexed twice.
Thanks
Strive
blacklist = \.(filepart)$
Remove "crcSalt = SOURCE".
You'll need to re-index those log files as Splunk has already seen them and will not re-index them unless you do something like clean the index (if that's possible on this index.)
The combined size of 6 log files is 4.5 MB.
In production the combined size of log files would be around 8 MB
blacklist = \.(filepart)$
Remove "crcSalt = SOURCE".
You'll need to re-index those log files as Splunk has already seen them and will not re-index them unless you do something like clean the index (if that's possible on this index.)
Thanks a lot kristian. Your suggestion makes sense to copy the files to temp folder first.
This solution worked...thanks #the_wolverine
Did you clean out the fishbucket as well? Unless you do so, Splunk will not re-index the files.
That is an index (which can be cleaned) where splunk stores what it has already seen (files, offset-pointers). Beware though that if you clean this, splunk will re-index any file it's been configured to monitor (if they're still there).
Oh, for reasons that you've just experienced, you should not copy huge files over the network directly into a monitored folder. It's better to copy it to a temp folder (on the same file system) and then move it into the monitored folder.
/k
Its working. Thank you.
I cleaned the index. Added blacklist = .(filepart)$
I did not remove crcSalt=
Data is not getting indexed.
You should really only use batch for one time read and destruct of your log files. Please refer to the documentation for batch input to confirm if that's what you really want to do.
Thank you for your response. I will check your solution.
The combined size of 6 log files is 4.5 MB.
Should i use [batch] rather than monitor in this scenario. Actually in production environment it will be around 7MB.