Getting Data In

Rsync Files Reindexing

msarro
Builder

Hey everyone, I am using a script (running via cron) to pull files from a remote server once a day. On that remote server, there is an export which runs once a day, and is then zipped. This is stored in a repository directory of all files ever created.

I run this command to pull down the files:

rsync -avz --exclude 'callhistoryexport_L*' myuser@1.2.3.4:FOLDER/callhistoryexport*.gz FOLDER/

Now, this should conserve all file information, including the modtime of the files. The script is set up because more devices will be added in the future, and they will all store their exports in this directory.

The universal forwarder installed on the machine running this script is configured to monitor that directory.

Now, after rsync runs, the splunk forwarder is re-sending everything to the indexers, and I can't figure out why. Rsync is preserving all timestamps, so that shouldn't be it. It's not redownloading anything, only the new file (verified in the logs from the script). So why is the forwarder resending everything?

Is anyone doing anything similar to this? If so, how did you resolve this problem? Or is there any way to get the forwarder to stop forwarding this data?

Tags (2)

dwaddle
SplunkTrust
SplunkTrust

Look at the rsync man pages relative to the --inplace and --append options.

  --inplace
         This causes rsync not to create a new copy of the file and then move it into place.  Instead rsync will overwrite the existing file, meaning that the rsync algo-
         rithm can’t accomplish the full amount of network reduction it might be able to otherwise (since it does not yet try to sort data  matches).   One  exception  to
         this is if you combine the option with --backup, since rsync is smart enough to use the backup file as the basis file for the transfer.

         This option is useful for transfer of large files with block-based changes or appended data, and also on systems that are disk bound, not network bound.

         The  option  implies  --partial  (since  an interrupted transfer does not delete the file), but conflicts with --partial-dir and --delay-updates.  Prior to rsync
         2.6.4 --inplace was also incompatible with --compare-dest and --link-dest.

         WARNING: The file’s data will be in an inconsistent state during the transfer (and possibly afterward if the transfer gets interrupted), so you  should  not  use
         this option to update files that are in use.  Also note that rsync will be unable to update a file in-place that is not writable by the receiving user.

  --append
         This  causes rsync to update a file by appending data onto the end of the file, which presumes that the data that already exists on the receiving side is identi-
         cal with the start of the file on the sending side.  If that is not true, the file will fail the checksum test, and the resend will do a normal --inplace  update
         to  correct  the  mismatched  data.  Only files on the receiving side that are shorter than the corresponding file on the sending side (as well as new files) are
         sent.  Implies --inplace, but does not conflict with --sparse (though the --sparse option will be auto-disabled if a  resend  of  the  already-existing  data  is
         required).

msarro
Builder

Ok great, hopefully this resolves the issue then, I'll know tomorrow morning!

0 Karma

nivedita_viswan
Path Finder

Hi @msarro,

I have a similar situation coming up. DO you know if this resolved the issue?

Thanks

0 Karma

dwaddle
SplunkTrust
SplunkTrust

I think here's the catch, based on how I understand rsync to work. Without --inplace, when rsync goes to update a file, it makes a copy of said file to be updated, does the update, then does a delete/rename. This should work, but --append is more like what Splunk is expecting to see...

msarro
Builder

Trying append and seeing if that will work, will get back to you. So rsync always touches the files no matter what?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...