Moving file atomically with S3 AWS plugin

Hi there!

Sidenote I had to jump through quite some hoops because I wanted to be helpful and link to various source files and documentation files, but discource won’t allow me to use more than 2 links in a post. I’ve spend more time trying to figure out how to get around this, or how to become trusted than it took to write this entire post.

Here are the URLs, I’m trying to link to, annotated with [label]* in the post

  • The guide: https://shrinerb.com/docs/changing-location#1-update-the-location-generation
  • Direct upload guide: https://shrinerb.com/docs/direct-s3#strategy-a-dynamic
  • Source (using id as the location): https://github.com/shrinerb/shrine/blob/master/lib/shrine.rb#L204-L217
  • Source (atomic persist equality): https://github.com/shrinerb/shrine/blob/master/lib/shrine/uploaded_file.rb#L222-L229

Following this guide*, I’m trying to move files.

S3 and the file ID

When there is no generate_location present (so no method and no pretty_location plugin, or any other plugin that defines a generate_location method), the default of Shrine is to store files at the filename of id, as determined by the Direct Upload guide*, which states the "id" is used to determine where to store the file on S3.

According to [this source code]*, the location is used as the "id" value.

I don’t know if this is standard for all storage backends, but it was enough to find the (potential) underlying issue.

Atomic persist

The atomic persist implementation does an equality check on the uploaded file[s]. That’s implemented [here]*. In short: it compares the type (class), id and storage_key (e.g. "cache" / "store"). A file with the exact same id and class but in different storages is, as expected, a different file. If everything matches, its the same file.

When changing metadata, derived versions, this is great. You can generate new derivatives, changed metadata and throw away the data if the underlying file is changed between the start of that process.

Problem

[The guide]* lays down the following steps:

  1. change or implement generate_location
  2. migrate (move) the existing files, either directly or via backgrounding
# Example code for step 2

begin
  attacher.atomic_persist           # persist changes if attachment has not changed in the meantime 
  old_attacher.destroy_attached     # delete files on old location 
rescue Shrine::AttachmentChanged,   # attachment has changed during reuploading 
         ActiveRecord::RecordNotFound # record has been deleted during reuploading 
  attacher.destroy_attached         # delete now orphaned files 
end

Since s3 moves the file by changing the ID, and the ID is used to determine equality, atomic_persist will always throw a Shrine::AttachmentChanged error, because it’s changing the location, which changes the ID, which makes them inequal.

Questions

  1. Is the guide incorrect? Or is the guide incomplete?
  2. Is there a Shrine-idiomatic solution for S3?
  3. Is using non-atomic persist "Good enough"™?

Workaround

I’m using #persist instead of #atomic_persist right now; and I can get some atomicity by transacting around persist, seeing if the original record filedata hasn’t changed. It works.

I hope everything is clear. I can set-up a minimal reproducible example, but this is all copied straight from the guide.

You are correct, this was a mistake in the guide. I’ve now corrected it to pass the current file to the Attacher#atomic_persist method.

It’s not terribly important to use atomic persisting, especially it’s not necessary if attachments are not editable for a given record.

Sorry for this, I forgot Discourse has this default, I’ve now removed the limit for number of links.

1 Like

Thank you for that change :slight_smile:

My questions have been answered and resolved.