Getting into the nitty-gritty details of #download

I am replacing this code, which works with Paperclip, with a new implementation using Shrine 3.2.1.

  def unpack
    if zip?
      tmp = Tempfile.new('unzip', :encoding => 'ascii-8bit')
      source = File.join('https://server.example.com', file_uid.gsub(' ', '+'))
      begin
        tmp.write( open(source, "rb").read )
        zipfile = Zip::File.open(tmp)
        zipfile.each do |entry|
          if entry.file? && ! File.basename(entry.name).match(/^\./)
            if entry.name.downcase.match(/\.(gif|jpeg|jpg|png)$/)
              extracted_file = zipfile.read(entry.name)
              new_image = self.title.images.build
              new_image.file = extracted_file
              new_image.file_name = File.basename(entry.name)
              new_image.save!
            elsif entry.name.downcase.match(/\.(pdf|mov|mp3|mp4|m4a|xml)$/)
              extracted_file = zipfile.read(entry.name)
              new_file = self.title.sources.build
              new_file.file = extracted_file
              new_file.file_name = File.basename(entry.name)
              new_file.content_format = new_file.parse_content_format
              new_file.save!
            end
          end
        end if zipfile
      ensure
        tmp.close
        tmp.unlink
      end
      self.destroy
    end
  end

Here’s as far as I’ve gotten:

  def unpack
    return unless file_name&.end_with? '.zip'

    tmp = file.download
    begin
      zipfile = Zip::File.open(tmp)
      zipfile.each do |entry|
        if entry.file? && ! File.basename(entry.name).match(/^\./)
          if entry.name.downcase.match(/\.(gif|jpeg|jpg|png)$/)
            extracted_file = zipfile.read(entry.name)
            new_image = self.title.images.build
            new_image.file = extracted_file
            new_image.file_name = File.basename(entry.name)
            new_image.save!
          elsif entry.name.downcase.match(/\.(pdf|mov|mp3|mp4|m4a|xml|epub|mobi|dat)$/)
            extracted_file = zipfile.read(entry.name)
            new_file = self.title.sources.build
            new_file.file = extracted_file
            new_file.file_name = File.basename(entry.name)
            new_file.save!
          end
        end
      end if zipfile
    ensure
      tmp.close
      tmp.unlink
    end
    self.destroy
  end

This gets as far as starting to loop through the contents of the Zip file, but then fails on the new_file.file = extracted_file assignment, with a lengthy error beginning with 767: unexpected token at '%PDF-1.5 %âãÏÓ 1853 0 obj <</Linearized 1/L 1556554/O 1857/E 25187/N 488/T 1519445/H [ 936 4018]>> endobj xref 1853 32 0000000016 00000 n 0000004954 00000 n 0000005157 00000 n 0000000936 00000 n 0000005220 00000 n 0000005359 00000 n 0000005460 00000 n 0000005575 00000 n 0000005937 00000 n 0000006625 00000 n 0000006797 00000 n 0000007423...

I have gotten as far as guessing that this may have something to do with the finer details of the working code, such as where I set the encoding of the tempfile to ascii-8bit and specifically read the file as binary with the :rb flag on open. I have tried a few different ways to pass those arguments in to the Shrine #download method, but I have not figured out how to do that yet.

Is there a default encoding that Shrine always uses when engaging the #download method? Can it be influenced? Is there something wrong with the way that I’m approaching the assignment? Should I create a tempfile for each “part” of the Zip and stage the data there first before assigning it to the Shrine attachment?

Thanks in advance for any help,

Walter

Hi, I think the problem is that extracted_file might be a string of binary content, whereas Shrine expects an IO-like object. If you try to assign a string, it will assume it’s JSON data of an uploaded file and try to parse it, resulting in a JSON::ParserError.

A simple fix is to wrap the string content in a StringIO, and give that to Shrine:

new_image.file = StringIO.new(extracted_file)

Thanks very much for the hint. I have that working so far, but now I am stuck with the metadata assignment for these unpacked files. They are being stored with mostly nulls (except for the binary data itself). Here’s an example:

{"id":"20a2d30c1be2a6df112f58c8c8f95969","storage":"store","metadata":{"filename":null,"size":1622017,"mime_type":null}}

I have access to the original filename inside my creation loop, but even if I assign it there to the record attribute, Shrine wipes it out when it writes the file, because I am using the metadata_attributes plugin.

Is there a way to assign a “filename” to a StringIO? I don’t see anything like that in the documentation. Failing that, is there a way to force the metadata values before writing in Shrine when you aren’t reading from a real file?

Thanks again,

Walter

You can override metadata per-assignment via the :metadata option (in that case you’ll need to switch to calling Attacher#assign directly).

new_image.file_attacher.assign StringIO.new(extracted_file), metadata: { "filename" => "..." }

Thank you! This worked perfectly.

For perpetuity, here’s my final unpacker:

  def unpack
    return unless file_name&.end_with? '.zip'

    tmp = Tempfile.new('unzip', :encoding => 'ascii-8bit')
    begin
      file.stream(tmp)
      tmp.rewind
      zipfile = Zip::File.open(tmp)
      zipfile.each do |entry|
        if entry.file? && ! File.basename(entry.name).match(/^\./)
          if entry.name.downcase.match(/\.(gif|jpeg|jpg|png)$/)
            extracted_file = StringIO.new(zipfile.read(entry.name), 'rb')
            new_image = self.title.images.build
            new_image.file_attacher.assign extracted_file, metadata: { "filename" => File.basename(entry.name) }
            new_image.save!
          elsif entry.name.downcase.match(/\.(pdf|mov|mp3|mp4|m4a|xml|epub|mobi|dat)$/)
            extracted_file = StringIO.new(zipfile.read(entry.name), 'rb')
            new_file = self.title.sources.build
            new_file.file_attacher.assign extracted_file, metadata: { "filename" => File.basename(entry.name) }
            new_file.save!
          end
        end
      end if zipfile
    ensure
      tmp.close
      tmp.unlink
    end
    self.destroy
  end

Walter

Glad to hear!

Note that Shrine already sets binary encoding for tempfiles, so you should be able to use Shrine::UploadedFile#download, and then you can use a block which ensures the tempfile is closed and deleted afterwards:

file.download do |tmp|
  zipfile = Zip::File.open(tmp)
  zipfile.each do |entry|
    # ...
  end
end

Though it uses binmode: true instead of encoding: "ascii-8bit", I’m not sure whether that’s equivalent.

Thanks! I saw that, and tried it first, but I was still getting the mangled contents, so I switched to what I have above. But that was before you showed me the light about assign, so maybe I have gone all belt-and-suspenders here. I know that the code I am upgrading used that very specific encoding for Zip files decoded with Zip:Zip, so that may still have some bearing. That was many years ago when I wrote that, and I no longer recall where I got that specific recipe.

Walter

And that was all the encouragement I needed to clean it up a bit further. Your suggestion worked just fine, and now there’s a lot less copy-pasta in my code:

  def unpack
    return unless file_name&.end_with? '.zip'

    file.download do |tmp|
      zipfile = Zip::File.open(tmp)
      zipfile.each do |entry|
        next unless entry.file?
        next if File.basename(entry.name).match(/^\./)
        extracted_file = StringIO.new(zipfile.read(entry.name), 'rb')
        new_file = record_for(entry)
        new_file.file_attacher.assign extracted_file, metadata: { 'filename' => File.basename(entry.name) }
        new_file.save!
      end if zipfile
    end
    self.destroy
  end
  
  def record_for(entry)
    if entry.name.downcase.match(/\.(gif|jpeg|jpg|png)$/)
      self.title.images.build
    elsif entry.name.downcase.match(/\.(pdf|mov|mp3|mp4|m4a|xml|epub|mobi|dat|html)$/)
      self.title.sources.build
    else
      fail Source::ExtractionError, "Unrecognized file type: #{File.basename(entry.name)}"
    end
  end
  
  class ExtractionError < StandardError
  end
1 Like