-
Couldn't load subscription status.
- Fork 50
Description
Hi there,
Awesome project, was nerd-sniped when my disc didn't have enough space to download the ChromeOS.
I was wondering if we could work around the necessity of downloading the ChromeOS image to disk or to memory and rather only download the parts we need in each "read".
Technical feasability
As I see it, the main mechanism that is used in inputstreamhelper, is feeding a file-like object into ZipFile:
| bstream = ZipFile(compat_path(imgpath), 'r').open(os.path.basename(imgpath).strip('.zip'), 'r') # pylint: disable=consider-using-with |
ZipFile will only read some metadata, such as the end of central directory by using seek/tell/read:
https://github.com/python/cpython/blob/ffa505b580464d9d90c29e69bd4db8c52275280a/Lib/zipfile.py#L1343
You then call the open() function on the ZipFile object which returns an object of type ZipExtFile. Again, ZipExtFile does only read some metadata from the file at this point.
On the ZipExtFile object your code calls seek/read/close etc. and the ZipExtFile does zip-specific things, but it only calls seek/tell/read on the originally given file-like object as well when asked.
Summarised: Any file-like object should work with the current ZipFile approach.
Proposal
Create a file-like HttpFile class that implements seek/tell/read etc. and uses the HTTP Range feature to only fetch certain parts of the zip file from the Google servers. I guess the class will need to be clever about the chunks it caches (e.g. it always keeps a 100MB chunk in memory), so that not every read() call will result in an HTTP request to the Google servers. Instead of downloading the ChromeOS image to disk, pass the HttpFile object into ZipFile.
I just checked and the Google servers where the ChromeOS images are downloaded do support HTTP Range.
Obviously this would need some testing (e.g. with a proxy to see how many HTTP request go out and what a good cache chunk size is).
Pro/Cons
Pro:
- More flexibility regarding disk space
- Inputstreamhelper could decide on it's own (in the HttpFile class) if chunks are stored in memory or on disk and how large the chunks should be
- Less data (number of bytes) might be downloaded from Google servers
Con:
- More HTTP requests to Google servers (number of requests) are sent (but this is configurable in the HttpFile class).
Alternatively, it would also be possible to only use this approach if less than the necessary disk space is available.
Was something like this approached before? What do you think?