Restrict metadata on derivatives

I am looking at optimizing the size of my database, and I see that currently I am storing metadata for derivatives that I do not care about. For example:

{:s1800=>
  #<FilesUploader::UploadedFile storage=:images_store id="ax6IXppOgtcNdiBTQRtZM-43a6adfad0.jpg" metadata={"filename"=>"image_processing20200402-26349-v44ddl.jpg", "size"=>407010, "mime_type"=>"image/jpeg", "width"=>1350, "height"=>1800, "exif_time"=>nil}>,
 :s1200=>
  #<FilesUploader::UploadedFile storage=:images_store id="ax6IXppOgtcNdiBTQRtZM-a7d7f3de69.jpg" metadata={"filename"=>"image_processing20200402-26349-nojnf6.jpg", "size"=>209382, "mime_type"=>"image/jpeg", "width"=>900, "height"=>1200, "exif_time"=>nil}>,
 :h600=>
  #<FilesUploader::UploadedFile storage=:images_store id="ax6IXppOgtcNdiBTQRtZM-a9adaabcde.jpg" metadata={"filename"=>"image_processing20200402-26349-b77k06.jpg", "size"=>68022, "mime_type"=>"image/jpeg", "width"=>450, "height"=>600, "exif_time"=>nil}>,
 :h300=>
  #<FilesUploader::UploadedFile storage=:images_store id="ax6IXppOgtcNdiBTQRtZM-3efd00d470.jpg" metadata={"filename"=>"image_processing20200402-26349-zn1g3w.jpg", "size"=>24475, "mime_type"=>"image/jpeg", "width"=>225, "height"=>300, "exif_time"=>nil}>}

Is it a way to not have those exif_time (custom one using add_metadata & filename (I guess it is a default one?) metadata for the derivatives?

I looked at the derivatives & metadata plugin, but I did not find a way to filter them.

As far as I know the exif_time must be added in manually - I could be wrong though.

Some possible lines of investigation:

  • Would you be able to double check your relevant uploader to see if exif_time this is not being extracted and added manually?

  • if you are doing direct uploads, perhaps check that your javascript library is not adding exif_time as part of the meta data?

Yes exif_time is one of my metadata, but I want it only on the original file, not on all derivatives (thumbnails for which I do not care about additional metadata, nor the filename)

That’s a good point, Shrine shouldn’t require you to store metadata you don’t care about. One hack is to (ab)use the add_metadata plugin to remove the values from the :metadata hash:

add_metadata do |io, metadata:, derivative: nil, **|
  metadata.reject! { |k, _| %w[exif_time filename].include?(k) } if derivative
  nil
end

Alternatively, for exif_time specifically you could have it excluded by using the hash-definition for add_metadata:

metadata_method :exif_time
add_metadata do |io, derivative: nil, **|
  { "exif_time" => ... } if derivative
end

The reason I decided to store nil-values is to communicate that metadata extraction was attempted, but that that the extractor returned nil.

Regarding filename, I was thinking over and over again whether to keep the filename for derivatives, because as you said it’s not useful. It’s currently used in #generate_location for determining the file extension, but afterwards it’s not needed.

One reason I didn’t feel comfortable removing it was backwards compatibility, I didn’t want UploadedFile#extension to become nil in certain scenarios. Or if people were using UploadedFile#original_filename to read the file extension I didn’t want to break that either. I distinctly remember thinking about this when preparing for the 3.0 release, but in the end I decided to keep things as they were (maybe because there was too much other work).

Thanks, it works file in my usecase to not store filename and your solution works.

Maybe there could be an option to not store filename, or a plugin to process metadata before they are stored, so you have a clear place to cleanup anything that is not needed.

There could also be an option to add_metadata to not store nil values for this metadata?

Both of your ideas sound good to me, a plugin for processing metadata before they are stored, and an option for add_metadata not to store nil values.

For the plugin for processing metadata, it could also work when the metadata is just copied, which happens when a Shrine::UploadedFile is being uploaded (e.g. uploading cached file to permanent storage). This would for example allow extracting metadata only for validation, but then leave it out before persisting to the database.

If you have time, I’d definitely appreciate PRs for this :slightly_smiling_face:

PR for the skip_nil option here: https://github.com/shrinerb/shrine/pull/458

The metadata processing plugin looks a bit more complex and I am not sure I will have time for a PR soon. One of the issues is that you need the metadata processing to happen after every metadata has been extracted, I am not sure where is the right place for this.

@renchap @janko Can I work on the second PR that was mentioned in this thread?

I will not have time to do it anytime soon, so feel free to have a go at it!