Justin's writings and TILs

Partial transfers with rsync

Need to transfer huge files with rsync and are afraid it will stop halfway? Make sure to use -P.

tl;dr:

There’s a lot of confusing information online about rsync, one of the best tools for unidirectional synchronization of filesystem data. People are sometimes confused that in the default configuration, if a transfer is canceled, the partial file is NOT kept. Then they see that they need to add --partial, and are further frustrated to return after a crash to see that the partial file is not there. What gives?

The short explanation is this: If rsync crashes, you probably have a mess on your filesystem. There’s no perfect solution in that case, and you can expect some sort of cleanup.

The rsync mental model

So what’s going on here? rsync is all about transferring the state of files from a source to a destination. It assumes that you want your files to remain consistent. That is, by default, rsync will not tolerate a partial transfer. If you have version 1 of a file on your machine, and you are trying to sync down version 2 of a file from your server, it assumes you do not want some partial Frankenstein file. This applies to new files as well, which at least ensures you either have the file or you do not.

During a transfer, the new data for the destination is stored in a temporary file. This temporary file is constructed based on pieces of your existing local file, and with new chunks loaded from the source. Since local data can be used, resuming from partial existing files is built-in! You do not need to add --partial or --append to efficiently download new parts of a file. rsync just does that.

When a transfer is complete, the temporary file is moved into place at the destination: a quick atomic update.

If the rsync transfer is interrupted, the contents of the temporary file are called a partial file, and by default it is deleted. This difference is key for understanding what the rsync options mean.

Interrupted transfers

When transferring huge files, this default configuration can be a problem if a transfer is interrupted. For example, maybe you decide to ctrl+c. By default, the temporary file is deleted, and your transfer progress is effectively lost.

And that’s where --partial/-P comes in. This flag says that when the transfer is shutting down, move the temporary file into place at the destination file anyway. By adding this flag, you are accepting that the destination file might be a weird partial file, but that’s what you want!

This doesn’t solve your problem if rsync crashes. In that case, the .file.random-letters file is left in place, and you have a mess. Whether rsync deletes or moves the temporary file depends on rsync actually running to completion, running its cleanup procedures. And no, rsync will not rediscover these orphaned temporary files on a subsequent run.

In this case, you can still recover: just mv the temporary file into place yourself, which is what rsync would have done. You have to do this every time rsync crashes. Even if you don’t, that temporary file will be sitting there taking up space.

Ah, and then there’s --inplace, which gives you a similar benefit. That avoids the use of temporary files altogether. It works, and even survives crashes, but it’s a bit of an odd flag to get accustomed to using. Not only will you see files in an inconsistent partial state, but you’ll see them as normal files while rsync is writing to them.

But what about…

--partial-dir? This is a confusing one. On an initial run, this functions similarly to --partial, but saves the file into the given directory instead of the destination. However, if there is already a file in the partial-dir, that file is used as the temporary file for the transfer. This means that as long as it saved once before, it can resume after a future crash. I’m really not sure why they don’t just create the partial as the initial temporary file.

--append? That means just assume that leading sections of the file are equivalent at the source and destination, and add data to the end of the destination. Without this flag, rsync will check to ensure the file contents are safe to extend, checking for changes.

--checksum? If two files have the same modification time and size, rsync assumes they are the same. This flag ensures they are checked based on hashes of their contents. You do not need this flag to enable efficient transfers. It’s useful if you’re concerned about data being corrupted.

“I started a transfer and forgot -P. Am I doomed?” No. rsync will want to delete your temporarily file, so don’t let that happen. You can kill -9 rsync on purpose :^)

As for me, I’m trying to commit -Pavz to memory to make all this easier.