Shrine

Memory usage when copying remote files to local?

So, I am doing a thing where I need to copy some large files (1GB and up) from S3 to a local machine.

As I’m debugging it, I’m getting a hint that the RAM usage may be going up with the size of the file, like it may wind up using as much memory as the size of the file. I’m not totally sure of this, as I’m not totally sure how to measure/profile it adequately, but I’m getting a suspicion.

Here’s what my code looks like, from a shrine file using the S3 adapter.

new_temp_file = Tempfile.new(some_name, :encoding => 'binary')
shrine_uploaded_file.open(rewindable:false) do | io |
  new_temp_file.write io.read until input_audio_io.eof?
end

I wrote that a while ago, but I wrote it trying to be careful to be performant, including trying not to buffer the whole thing in RAM.

Does anyone have any thoughts?

Do you think that code should end up using RAM for the entire file?

Do you have any tips for ways to ways to write this even more performant, including with regard to RAM usage?

Have you compared this method to the native Shrine @attach[ment|er].download method? Is the problem the same, or worse when you use that?

Walter

I haven’t. I think the relevant file is UploadedFile#download, no?

I can’t easily reproduce this, am not in a situation where I can just “run and see how much memory it uses”, I’m still trying to figure out how to get there. (Any advice welcome). So can’t easily just compare and contrast, this is just detective work I’m doing from a running application.

I can’t recall why I didn’t use that method in the first place… maybe I thought I could be more performant and minimize use of RAM more? Maybe with the rewind: false?

(Looking at source code right now considering switching, I thought at first maybe i could do download(rewind:false) and have the option passed down to the underlying open, but nope, that results in validate!': unexpected value at params[:rewind] (ArgumentError))

Hmm. Thanks for the feedback, definitely curious if anymore super familiar with the internals has some ideas.

Actually this is a really good point @walterdavis, thanks!

Why am I reinventing the wheel here? I don’t remember. I think I will switch to the built into UploadedFile#download method.

You can pass rewindable: false to it, to avoid making an extra on-disk buffer copy, for possibly better performance with very large files, although it’s not documented very well.

You can’t choose your own name for the tempfile, which I was doing for making some kinds of debugging easier. But I think I can give that up. (Hypothetically could try to PR a feature to shrine to let you do it, but I don’t think it’s worth it to me right now).

If I was going to keep doing it myself, it would make sense to look at shrine’s implementation and copy parts of it… for instance, using IO.copy_file (which shrine does) seems a way better idea than my own manual ruby copying of bytes, and likely to be a big perf advantage.

Yeah, copying bytes sounds like something you have to get EXACTLY right…

Walter