At present, we have a stand-alone Splunk server, monitoring a mapped directory of log files. In order to reduce the load, we are adding a search head, and also want to install a universal forwarder that will forward the files in the mapped directory to the indexer. Most of the log files in the mapped directory are already indexed, while about 10-15% of the files are yet to be indexed.
I believe installing a forwarder and forwarding these files to Splunk should not cause re-indexing, since Splunk keeps track of the files that have already been indexed. However, when I tried this scenario in a test environment, with a small subset of the data, I noticed all the files in the directory were re-indexed. Is this to be expected? Or is there something wrong with my configuration?
Not sure how to limit this altogether, however you could temporarily set the MAX_DAYS_AGO value for the sourcetype on your indexer to ensure it doesn't reindex more than one day. You could then delete duplicate events with | delete command.
# in props.conf on indexer
[my_source_type]
MAX_DAYS_AGO = 1
From documentation:
MAX_DAYS_AGO =
* Specifies the maximum number of days past, from the current date, that an extracted date
can be valid.
* For example, if MAX_DAYS_AGO = 10, Splunk ignores dates that are older than 10 days ago.
* Defaults to 2000 (days), maximum 10951.
* IMPORTANT: If your data is older than 2000 days, increase this setting.
Is the path of the mapped directory the same on the server as on the UF?
You could try to copy the "_fishbucket" directory from the server to the UF (and restart)
The _fishbucket index keeps track of what is indexed and what not.
Haven't tested it, but in theory it should work
Yes, the path is the same. Thanks for the suggestion, let me give it a shot.
For now, we decided against installing the forwarder. So I won't get a chance to try out these suggestions. I'll update here if we do try this in the future.
it should be "_thefishbucket" btw
We just installed Splunk and wanted to index an year's worth of data. So the mod times of these files vary from 1 day to 365 days. There is no way of knowing which files are indexed and which files are still in the process of being indexed. So I still want the yet-to-be-indexed files to be indexed.
Would be interesting to know if the _fishbucket method mentioned below works out for you. Otherwise, you would also have the option of cleaning the index after installing the Universal Forwarder causing a reindex of all events for that index.