Shrine

Converting a complex process to a derivative

I am upgrading a Shrine 2 application to Shrine 3, and along with that, I need to move my processing plugin workflow into derivatives.

Here’s where I am starting:

Because my command-line conversion wants to run from an actual file, not an IO, I run this process in an after_action in the controller, so the interaction runs like this:

  after_action :parse_document, only: %i[create update]
...
  def parse_document
    return unless document_params[:parse] == '1'
    return unless @document.file

    Rails.logger.info 'parsing to HTML'

    begin
      @document.reload if @document.valid?
      @document.file_attacher.promote(action: :convert)
      @document.save!
    rescue PandocCommandError
      @document.errors.add :file, "could not be converted to html."
      flash.discard(:notice)
      flash[:error] = @document.errors.full_messages.first
    end
  1. User uploads a DOCX file along with a bunch of associated form data.
  2. The after_action filter notices the file, and triggers the conversion.
    • This triggers a download of the original
    • Pandoc reads the original, creates an HTML equivalent in a tempfile
    • That tempfile is read in a separate Ruby process, and the text is assigned to the attachment’s record and saved.

Yes, this is fairly involved, and there’s probably a better way to do it, but I’m having difficulty understanding the migration guide and where to put the pieces. So far, I’ve got this:

But I’m not clear on how to do the equivalent of @document.file_attacher.promote(action: :convert) as I have existing in the controller.

Any suggestions how to drag this across the line?

If there is a way to alternatively pipe the content of the original file directly into Pandoc, I could do all of this in one step, without needing to do this whole two-step with the tempfile and the upload followed by a download… If that could make this whole problem simpler (so the controller would not need to be involved) I would be all in favor of that!

Thanks in advance,

Walter

I think I may have a solution. Take all of this out of the upload path, and just handle it separately:

# frozen_string_literal: true

require "open3"
PandocCommandError = Class.new(StandardError)
class DocumentExtractor
  def initialize(record)
    @record = record
  end

  def process
    return @record unless @record.parse?

    output_html = Tempfile.new(%w[pandoc .html], binmode: true)
    _stdout, stderr, status = Open3.capture3 *%W[pandoc -f docx #{@record.file.download} -t html5 -o #{output_html.path}]
    raise PandocCommandError, stderr.chomp unless status.success?

    output_html.open do |file|
      @record.body_html = file.read.force_encoding('UTF-8')
    end
    @record
  end
end

Now I have a PORO I can call from my documents controller (@document.body_html = DocumentExtractor.new(@document)) or even from the model if that works better.

I’ll update if anything significant changes here, mostly so that future-me remembers what happened…

Thanks again for Shrine,

Walter