Issues uploading pdf with a few large photos

I’ve been using shrine for quite some time now and it’s been working great. I recently upgraded from rails 4 to rails 5 and during that switch, now there seems to be an issue with shrine (v2.9.0).

Uploading smaller PDF documents cause no issues, but when some documents have a few large images, the uploader seems to fail silently and gets hung on the following line:

uploaded_file = uploader.upload(File.open(tempfile.path))

Any ideas? Nothing is logged when it freezes. No errors, no anything. I’m not sure what the problem is or how to troubleshoot it.

Relevant files are below.

# initializers/shrine.rb
# frozen_string_literal: true

  require 'shrine/storage/s3'

  s3_options = {
    access_key_id: ENV['aws_access_key_id'],
    secret_access_key: ENV['aws_secret_access_key'],
    region: ENV['aws_region'],
    bucket: ENV['aws_bucket']
  }

  Shrine.storages = {
    cache: Shrine::Storage::S3.new(prefix: 'cache', upload_options: { acl: 'public-read' }, **s3_options),
    store: Shrine::Storage::S3.new(prefix: 'store', upload_options: { acl: 'public-read' }, **s3_options)
  }

Shrine.plugin :activerecord

Shrine.plugin :presign_endpoint, presign_options: lambda { |request|
  filename     = request.params['filename']
  extension    = File.extname(filename)
  content_type = Rack::Mime.mime_type(extension)

  {
    content_disposition: "inline; filename=\"#{filename}\"", # download with original filename
    content_type: content_type # set correct content type
  }
}

Shrine.plugin :cached_attachment_data
Shrine.plugin :determine_mime_type
Shrine.plugin :keep_files, destroyed: true, replaced: true
Shrine.plugin :logging

.

# job_report_pdf_generate_worker.rb
# frozen_string_literal: true

include ApplicationHelper
include BatchesHelper

class JobReportPdfGenerateWorker < ApplicationJob
  queue_as :default

  def perform(job_report_id)
    job_report = JobReport.find(job_report_id)
    report = Report.find(job_report.report_id)
    settings = job_report.job_report_settings.includes(:report_setting)

    job = Job.find(job_report.job_id)
    tests = job.tests
    samples = job.samples
    results = job.results.sort_by_position_then_sample.group_by(&:sample_id)
    batches = job.batches
    attachments = job.attachments_included_in_test_report

    begin
      content = ActionController::Base.new.render_to_string(
        template: "reports/custom/#{report.partial_path}/#{report.partial_path.split('/')[1]}",
        locals: {
          :@job => job,
          :@tests => tests,
          :@samples => samples,
          :@results => results,
          :@batches => batches,
          :@attachments => attachments,
          :@job_report => job_report,
          :@settings => settings
        },
        layout: 'pdf.html.erb'
      )

      response = $docraptor.create_async_doc(
        test: ENV['docraptor_testing'], # test documents are free but watermarked
        document_content: content, # supply content directly
        # document_url:   "http://docraptor.com/examples/invoice.html", # or use a url
        name: "job report #{job_report.id}.pdf", # help you find a document later
        document_type: 'pdf', # pdf or xls or xlsx
        # javascript:       true,                                       # enable JavaScript processing
        prince_options: {
          media: 'screen' # use screen styles instead of print styles
          #   baseurl: "http://hello.com",                                # pretend URL when using document_content
        }
      )

      900.times do
        status_response = $docraptor.get_async_doc_status(response.status_id)
        case status_response.status
        when 'completed'
          doc_response = $docraptor.get_async_doc(status_response.download_id)

          tempfile = Tempfile.new([job_report.id.to_s, '.pdf'], Rails.root.join('tmp'))
          tempfile.binmode
          tempfile.write doc_response
          tempfile.close

          uploader = DocRaptorUploader.new(:store)
          uploaded_file = uploader.upload(File.open(tempfile.path))
          job_report.document_data = uploaded_file.to_json
          job_report.pdf_updated_at = Time.now

          job_report.save

          tempfile.unlink

          job_report.update_columns(status: 'Success')
          break
        when 'failed'
          puts 'FAILED'
          puts status_response
          break
        else
          sleep 1
        end
      end

      job_report.update_columns(status: 'Error') if job_report.reload.status == 'Processing'
    rescue StandardError => se
      job_report.update_columns(status: "Error - #{se}")
    end
  end
end

Does this happen when you are working locally on your own computer, or on your server? If you’re on a server, is it Heroku? This could be related to your Web server (Apache or NGINX) if that’s under your control. Perhaps something is set there related to total POST size, for example.

It happens locally and on heroku (both running puma). How can I verify if the webserver is the problem and any ideas on how to address it?

I’ve had a look at the puma docs, and puma -h in the CLI, and I don’t see anything likely. I’ve never tried to use Puma in production, so you may want to explore other options. Most of the help I found on SO and others (quick search on DuckDuckGo for “rails large file upload tuning”) gave me advice about Apache. Some of it was specific to SSL, which I didn’t realize could affect this.

Good luck with this! I know that it can be done, I had a site on AWS with Apache and Rails 3 that we tested with over 1GB uploads. That was just Apache 2.2.x, Ruby 2.3, and Rails 3.something with Paperclip. Files were uploaded to AWS and then re-uploaded to S3 from the server.

Since you’re on Heroku, there’s another layer of difficulty there. You are incredibly constrained to the size of files that you’re allowed to upload in Heroku. Most applications that deploy there use direct uploads to S3, completely circumventing any form of local storage, for that reason.

Have a look at the Shrine + Uppy example in the docs – I believe that it does this sort of direct upload by default.

Walter

I’m surprised that a Rails upgrade would affect behaviour here, because Shrine is not using Rails. Did you also happen to upgrade the Ruby version or aws-sdk-s3 gem version with the Rails upgrade?

One thing I would try is opening the input file in binary mode:

uploaded_file = uploader.upload(File.open(tempfile.path, "rb"))

This is needed anyway (I’m a bit surprised it was working before), but maybe Rails 5 changed some default encodings or something :man_shrugging:

Other than that I can only help if you can reproduce the issue in isolation, without Rails or docraptor. Or in a fresh Rails app if presence of Rails is causing the issue.

Upgrading aws-sdk-s3 and any other related dependencies fixed the issue. Thanks.