Multipart Upload to S3 Heroku

Hi,

I am pretty new to ruby on rails and am trying to setup an uploader to upload large files (>5gb), therefore the uppy s3 multipart plugin was a large help.

The upload works completely fine in my local environment. However,I am currently facing an issue with regards to the multipart upload to s3 in heroku.

Upload (31606ms) – {:storage=>:store, :location=>"fecde9eada53f05b31e471da28f27144.mp4", :io=>ActionDispatch::Http::UploadedFile, :upload_options=>{}, :uploader=>FileUploader}

The above file is approximately 1.2GB.
I have tested with a 5.2GB file, it took approximately 3 minutes for the above operation.
Heroku has a strict limit of 30 seconds on all request, therefore the above upload is causing the request to timeout, resulting in the H12 request timeout heroku error. I am curious if there is anything I am doing wrong as a quick search doesn’t yield any other people having the same problem as I do.

Below is my shrine.rb file:

require "shrine"
require "shrine/storage/file_system"
require "shrine/storage/s3"

s3_options = {
  bucket: ENV['AWS_S3_BUCKET'], 
  access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
  region: ENV['AWS_S3_BUCKET_REGION'],
}

Shrine.storages = {
  cache: Shrine::Storage::S3.new(prefix: "cache", **s3_options),
  store: Shrine::Storage::S3.new(**s3_options),
}

Shrine.plugin :activerecord                  # loads Active Record integration
Shrine.plugin :model, cache: false
Shrine.plugin :determine_mime_type
Shrine.plugin :cached_attachment_data        # enables retaining cached file across form redisplays
Shrine.plugin :restore_cached_data           # extracts metadata for assigned cached files
Shrine.plugin :uppy_s3_multipart             # uppy s3 multipart support
Shrine.plugin :instrumentation, notifications: ActiveSupport::Notifications  # logging for easier debug

Below is the controller action:

def create
    @raw_data = RawData.new(raw_data_params) **#Upload happens here**
    authorize @raw_data
    @raw_data.project_id = params[:project_id]
    if @raw_data.save!
      redirect_to project_path(@raw_data.project.id), notice: 'File uploaded'
    end
end

If you’re committed to staying on Heroku, you should definitely look into direct-to-s3 uploads with Uppy. There’s an article or two, and sample code, on the Shrine site. This is just better design anyway, because the long-upload you are experiencing is also an indication that your server is having to run a complete Rails process just to manage that upload. That process is fully engaged, even though only bits are flowing from a browser to the server. It could be serving other requests, but it can’t.

Walter

Hi,

Thanks for the reply Walter.

As far as I know, direct-to-s3 uploads with Uppy only supports uploading files up to 5GB in size?

Is there a way to bypass this limit without resorting to multipart uploads?

Due to some requirements, the uploader must be able to upload files larger than 5GB as well.

HS

I think, if you’re uploading that much data over a web form, you’re gonna want multi-part upload anyway, just for the ability to resume after failure. Can you resume a failed upload on a monolithic file?

Walter

Yes, it can be resumed.

I do apologize for the poor phrasing on my end though, as it isn’t the uploading that is failing, the upload goes through and the record is saved. The file can be downloaded after with no issues.The problem lies within the request taking too long (>30 seconds) on large files (>1GB) causing Heroku to timeout internally, thereby causing the app itself to present an error page.

I was just wondering how others typically handle uploading of large files (>5GB) on Heroku without it timing out internally. Now that I think about it, perhaps a way around this is just to redirect any Heroku Application Error for timeouts stemming from uploads to a page to hide it since there is technically no failures.

HS

Typically people avoid uploading files to Heroku for couple reasons - storage is ephemer, resources consumed, lack of control of server restarts/config, etc. AWS S3 or alternatives are widely used and even more so for large file sizes.

You can change the timeout on Heroku but even Heroku suggests using direct upload to S3.

For AWS, the 5GB limit is for a single PUT but if you did multi-part upload than you can beyond up to 5TB by configuring each part size and how many parts/requests it will take to upload see Amazon S3 Multipart Upload Limits.

Hope that helps.

The built-in Shrine S3 storage can already upload files larger than 5GB on the server side, the uppy-s3_multipart gem doesn’t provide this feature.

What uppy-s3_multipart provides are the endpoints for Uppy’s aws-s3-multipart plugin, which uploads the file directly to S3 in multiple chunks. It’s essentially the same thing Shrine’s S3 storage is doing for you now, but done from the client directly to S3.

This completely avoids Heroku’s 30s request limit. The only thing to keep in mind is that restore_cached_data plugin might add some overhead when you submit the form (you decide whether that’s feasible for you). Also, Shrine still needs to copy the directly uploaded cached file to permanent storage on the server side after assigning the attachment. That should normally be fast, but if this happens to be taking time you can use the backgrounding plugin to move it into a background job.