Hey everyone, I am using a script (running via cron) to pull files from a remote server once a day. On that remote server, there is an export which runs once a day, and is then zipped. This is stored in a repository directory of all files ever created.
I run this command to pull down the files:
rsync -avz --exclude 'callhistoryexport_L*' myuser@1.2.3.4:FOLDER/callhistoryexport*.gz FOLDER/
Now, this should conserve all file information, including the modtime of the files. The script is set up because more devices will be added in the future, and they will all store their exports in this directory.
The universal forwarder installed on the machine running this script is configured to monitor that directory.
Now, after rsync runs, the splunk forwarder is re-sending everything to the indexers, and I can't figure out why. Rsync is preserving all timestamps, so that shouldn't be it. It's not redownloading anything, only the new file (verified in the logs from the script). So why is the forwarder resending everything?
Is anyone doing anything similar to this? If so, how did you resolve this problem? Or is there any way to get the forwarder to stop forwarding this data?
Look at the rsync man pages relative to the --inplace
and --append
options.
--inplace This causes rsync not to create a new copy of the file and then move it into place. Instead rsync will overwrite the existing file, meaning that the rsync algo- rithm can’t accomplish the full amount of network reduction it might be able to otherwise (since it does not yet try to sort data matches). One exception to this is if you combine the option with --backup, since rsync is smart enough to use the backup file as the basis file for the transfer. This option is useful for transfer of large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. The option implies --partial (since an interrupted transfer does not delete the file), but conflicts with --partial-dir and --delay-updates. Prior to rsync 2.6.4 --inplace was also incompatible with --compare-dest and --link-dest. WARNING: The file’s data will be in an inconsistent state during the transfer (and possibly afterward if the transfer gets interrupted), so you should not use this option to update files that are in use. Also note that rsync will be unable to update a file in-place that is not writable by the receiving user. --append This causes rsync to update a file by appending data onto the end of the file, which presumes that the data that already exists on the receiving side is identi- cal with the start of the file on the sending side. If that is not true, the file will fail the checksum test, and the resend will do a normal --inplace update to correct the mismatched data. Only files on the receiving side that are shorter than the corresponding file on the sending side (as well as new files) are sent. Implies --inplace, but does not conflict with --sparse (though the --sparse option will be auto-disabled if a resend of the already-existing data is required).
Ok great, hopefully this resolves the issue then, I'll know tomorrow morning!
Hi @msarro,
I have a similar situation coming up. DO you know if this resolved the issue?
Thanks
I think here's the catch, based on how I understand rsync
to work. Without --inplace
, when rsync goes to update a file, it makes a copy of said file to be updated, does the update, then does a delete/rename. This should work, but --append
is more like what Splunk is expecting to see...
Trying append and seeing if that will work, will get back to you. So rsync always touches the files no matter what?