I'm setting up splunk and have a configuration where I have a single machine for indexing and searching. I have a handful of application servers that I have set up LightForwarders on. The applicaiton servers currently have several months worth of logs on them. I would like those logs to be indexed. But I don't think I want all that historical data to go through the forwarders. How is the best way to get this indexed before starting my forwarders. I was thinking of copying the logs over to the indexer and adding the containing directly as an input. The indexer would then index those once. New data would come in from forwarders.
IMO the easiest thing would be to just let the forwarders do the work. You could move the files to the index servers to be indexed locally but then 1. you'd have to make sure they were evenly distributed and 2. you'd be using the network anyway to move them. If you're concerned about crushing your network you could adjust the thruput value in your limits.conf. If its cpu/io on your forwarders that concerns you then monitor a directory and slowly move the files in to be forwarded.
In the end though either way will yield the same results.