Shrine saving file as empty

I recently upgraded to Shrine 3 and it’s been mostly seamless with the exception of 1 hiccup.

I have one model that saves an incoming string as a file.

In Shrine2, I wrapped it in the following class and saved it like this

# frozen_string_literal: true

class IOObject
  attr_accessor :string_io, :original_filename

  delegate :read, :rewind, :eof?, :close, :size, to: :string_io

  def initialize(string, filename)
    @string_io = StringIO.new(string)
    @original_filename = filename
  end
end


feed.uploaded_file = IOObject.new(csv, filename)
feed.save

Previously, this worked fine, but now, intermittently, it will save an empty file when I know there was data there.

I’m looking to see if it could be something else, but the behavior only started after the gem upgrade and I want to make sure I haven’t missed something in the guide.

I don’t know what change in Shrine 3 could be causing an empty file to be uploaded, especially intermittently.

What in general causes an empty file to be uploaded is if the IO object wasn’t rewinded somewhere after its content was read. Once a user was calculating a MD5 hash of the content for upload location, but they forgot to rewind the IO object after reading the content, causing an empty file to be uploaded. But that would then happen consistently.

When you say that the file was empty, did you check using Shrine (i.e. file.uploaded_file.download), or did you check directly in the browser/storage? I would recommend the latter just to eliminate any potential bug in #download. Also, which storage are you using? S3 shouldn’t allow you to upload an empty file if the size attribute of the IO object was not zero.

Could you post here your Shrine setup (initializer + uploader)? Also, you could try reproducing this bug in a self-contained script, which would guarantee that the bug will be found :smiley:

I’ve added a rewind before the save, but am waiting for review before I can deploy, I did wonder if that could be causing it.

We’re just using file storage, so I check it using feed.uploaded_file.read as well as looking at the file in the directory.

The issue is occurring during a resque job that runs every morning. When I run the job manually in the console, it works as expected.

initializer:

# frozen_string_literal: true

require "shrine"
require "shrine/storage/file_system"

Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new("public", prefix: "system/shrine_cache"), # temporary
  store: Shrine::Storage::FileSystem.new("public", prefix: "system", clean: false) # permanent
}

Shrine.plugin :activerecord
Shrine.plugin :determine_mime_type
Shrine.plugin :refresh_metadata
Shrine.plugin :validation_helpers
Shrine.plugin :determine_mime_type

uploader:

# frozen_string_literal: true

class DynamicAdFeedUploader < Shrine
  plugin :add_metadata
  plugin :dynamic_ad_feed_naming
  plugin :signature

  Attacher.validate do
    validate_mime_type_inclusion ContentType::CSV
  end

  add_metadata :fingerprint do |io, _context|
    calculate_signature(io, :md5)
  end
end

What is the dynamic_ad_feed_naming plugin doing? Also, could you share the relevant Resque job code where the bug occurs?

Yeah, it would be useful to verify whether the file was not rewinded before attaching, or this happened sometimes during attachment.

class Shrine
  module Plugins
    module DynamicAdFeedNaming
      module InstanceMethods
        def generate_location(_io, context)
          id = context[:record].id
          name = Sifi.anonymize_filename(context[:metadata]["filename"])

          ['uploaded_assets', 'dynamic_ad_feeds', id, name].compact.join("/")
        end
      end
    end

    register_plugin(:dynamic_ad_feed_naming, DynamicAdFeedNaming)
  end
end

It’s just setting a location.

There’s some secret sauce in the service that’s called so I can’t share the whole thing, but the worker makes the same Service.call that I can call from the console or that is triggered by the UI.

I’m going to add the rewind right before the save and add a ton of logging. I was hoping I was doing something just blatantly wrong, but it might be something more subtle.

It was definitely the rewind.

After saving, IF the file had changed, then some additional work was done by reading the file and working on the data. The object was then saved after that and AR picked up the changed file position and saved an empty file.

Some more rewinds and it’s fine.

It was a race condition between caching and storage. I’m going to use the attacher directly.

hmm, I need validations so using the attacher directly to permanent storage won’t work.

Immediately after the save, I pass the object to another service that uses the data in the file to do some work. It seems like the file isn’t finished being moved to permanent storage until after this. I’ve surrouned my read calls with rewind, but I’m still a little bit leery that I might catch the update just right and still wind up with an empty file.

Is promotion done in a fork or thread?

In the same method as the save, is there a way to guarantee that the file has been moved to permanent storage before attempting to read it back out?

Is promotion done in a fork or thread?

That depends on which backgrounding library you’ve set up for Shrine’s backgrounding plugin. Resque uses forking, while Sidekiq uses threads. By default Shrine does promotion synchronously.

In the same method as the save, is there a way to guarantee that the file has been moved to permanent storage before attempting to read it back out?

What do you mean exactly? You can do something like photo.image if photo.image.storage_key == :store or photo.image if photo.image_attacher.stored?. But if you want something done after promotion, and you’ve configured Shrine to use backgrounding, then you’d put that code in a background job. Otherwise you can just execute code after model.save.

We’re not using the backgrounding plugin at all.

This is simplified, but our flow goes like this

feed.uploaded_file = IOObject.new(csv, filename)
feed.save
puts "after save"
puts "before service call"
result = AnotherService.call(feed: feed)
puts "after service call"

Inside AnotherService, it makes a call to feed.uploaded_file.read

  Feed Update (0.6ms)  UPDATE "feeds" SET "updated_at" = $1, "uploaded_file_data" = $2 WHERE "feeds"."id" = $3  [["updated_at", "2020-01-14 13:52:39.551557"], ["uploaded_file_data", "{\"id\":\"uploaded_assets/feeds/5014/68356.csv\",\"storage\":\"cache\",\"metadata\":{\"filename\":\"b7ab44b838cf51853ff0c1f6af010cfd.csv\",\"size\":249362,\"mime_type\":\"text/plain\",\"fingerprint\":\"323d86e31e4327529f8d3e1f1c3a88b7\"}}"], ["id", 5014]]
after save
before service call
after service call
   (2.0ms)  COMMIT
   (0.1ms)  BEGIN
  Feed Update (5.6ms)  UPDATE "feeds" SET "updated_at" = $1, "uploaded_file_data" = $2 WHERE "feeds"."id" = $3  [["updated_at", "2020-01-14 13:52:40.776797"], ["uploaded_file_data", "{\"id\":\"uploaded_assets/feeds/5014/68356.csv\",\"storage\":\"store\",\"metadata\":{\"filename\":\"b7ab44b838cf51853ff0c1f6af010cfd.csv\",\"size\":249362,\"mime_type\":\"text/plain\",\"fingerprint\":\"323d86e31e4327529f8d3e1f1c3a88b7\"}}"], ["id", 5014]]

before I added the feed.uploaded_file.rewind after the read, I was getting empty files saved.

So after the save, if I did any work on the file from the AR object w/out rewinding, it would happen before the file persistence happened.

Are you wrapping all of this in a transaction? If that’s the case, Shrine won’t promote the cached file until after the outer transaction has been committed, as it uses an after_commit callback. The reason is that having slow things such as uploads happen inside a transaction lengthens the time transaction needs to be open, which can have performance implications.

If the code you posted is not wrapped with a transaction, then promotion must be finished after the save call, nothing else can delay it.

ah, that’s it. Thanks so much. That makes total sense and for some reason I just didn’t even consider it.