Migrating data from CarrierWave

I’ve got a table full of data from another application that I am migrating to Shrine from CarrierWave. The old CW file name column has been copied into the new (Shrine) file_data column. That part worked fine. But now, when I want to open these new files to convert their storage, Shrine insists on trying to parse the file_data before I ask for it, and since it’s not valid JSON, it blows up with

JSON::ParserError: 783: unexpected token at 'elizabeth-spivak-p-wVnN70.pdf'

Is there any way that I could guard against that, so my migration could continue?

Or would you suggest that I keep the filename in a different column and not try to use the file_data column until I store the file with Shrine?

I’d rather not keep the legacy column around, as the new app will have no need for it after conversion.

  task migrate_files: :environment do
    file_upload_root = '/data/web/apps/fapd_emc/evaluation/file'
    puts 'Beginning storage conversion'
    Evaluation.where(Arel.sql('file_data is not null')).find_each do |evaluation|
      puts "Converting evaluation ##{evaluation.id}"
      # We assume JSON file_data is managed by a Gem like Shrine, not a filepath.
        file_data_is_json = JSON.parse(evaluation.file_data_before_type_cast)
      rescue JSON::ParserError
        # this is not an error
        next if file_data_is_json

      path = File.join(file_upload_root, evaluation.id.to_s, evaluation.file_data_before_type_cast)
      # Only attempt conversion if the old file is found
      if File.file?(path)
        evaluation.update(file: File.open(path), file_data: nil)
        puts "Cannot locate FAPD EMC file data for Evaluation ##{evaluation.id}"
        evaluation.update(file_data: nil)

As you can see, I have a guard in there against trying to re-convert JSON, but the error happens way before I can reach that point, when the record is first initialized.


Hi, parsing attached file data happens when the attacher is initialized on the model instance. Loading the record from the database does not automatically initialize the attacher, as that would be a potential performance issue. Something else in the model must be referencing the attacher (explicitly or implicitly) as soon as the model is initialized.