image/copy: Fix missing progress reporting for chunked layers#848
image/copy: Fix missing progress reporting for chunked layers#848simek-m wants to merge 4 commits into
Conversation
| } | ||
| // Setup progress reporting and report a new artifact event | ||
| // if the channel available and a non-zero interval set. | ||
| if ic.c.options.Progress != nil && ic.c.options.ProgressInterval > 0 { |
There was a problem hiding this comment.
I was thinking of moving the nil checks from the caller side to the methods and allow nil receivers to clean it up a bit, but there is only one special-cased instance of this pattern (or two) in this codebase and it would be confusing, I think.
There was a problem hiding this comment.
This is not immediately relevant (bar.ProxyReader breaks it anyway), but various code in the standard library special-cases known implementations of io.Reader and the like (e.g., but not only, checking for things like WriterTo); so adding proxy objects might have a non-obvious performance cost — it’s better to make the whole proxy usage conditional than to always interpose a proxy and then have it be a no-op under some conditions.
(This does not really apply to GetBlobAt because that’s our private interface.)
mtrmac
left a comment
There was a problem hiding this comment.
Thanks!
Most importantly, the Podman-side tests seem broken/insufficient.
I don’t feel strongly about adding the extra progressReporter abstraction — but if we have it, it should be used consistently in the two users.
| } | ||
| total += int64(c.Length) | ||
| } | ||
| // Report read bytes if possible. |
There was a problem hiding this comment.
(This comment adds no value over the code itself, please drop.)
| @@ -0,0 +1,50 @@ | |||
| package copy | |||
There was a problem hiding this comment.
This works, but I think progress.go would be a better name long-term.
We don’t really need a separate file for every small type we define (the language does not require it, at least) — and if we move towards sharing more of the progress-reporting proxies (in particular, for containers/podman#20634 moving away from the current bar.ProxyReader), having them all in progress.go would allow keeping them together and making sure they are consistent.
There was a problem hiding this comment.
I wanted to move it because progress_bars.go was no longer true, but wasn't sure about having this small a file. I could see other examples, but noticed the tendency to keep more code together.
progress was my other option, I'll go with that.
| // reportRead fires the types.ProgressEventRead event with `bytesRead` | ||
| // to its progress channel. |
There was a problem hiding this comment.
- “fires… with
bytesRead” is not actually true most of the time - If we have this abstraction, fine, but then it would be useful for the abstraction to abstract the mechanism, i.e. for the documentation to say what the semantics is, not “fires $internalImplementationDetail”. E.g. “indicates to the internal channel that the progress has been finished” is not at all ideal but it’s better than “fires ProgressEventDone” because the user then has to look up what
ProgressEventDonemeans. Compare https://github.com/containers/container-libs/blob/main/image/AGENTS.md#documentation .
| // fully fetches the remaining data from the offset to the end of the blob. | ||
| // | ||
| // blobChunkAccessorProxy.GetBlobAt also updates a *progressBar | ||
| // and *progressReporter (if non-nil) with the number of bytes read. |
There was a problem hiding this comment.
The comment here was exactly copied from the interface definition … and I think that might be sufficient, just as a reminder of the contract.
| // that a new artifact will be read | ||
| channel <- types.ProgressProperties{ | ||
| // reportNewArtifact fires types.ProgressEventNewArtifact to its progress channel. | ||
| func (r *progressReporter) reportNewArtifact() { |
There was a problem hiding this comment.
I think this is better done in the constructor; that makes the callers shorter and they can’t forget.
In particular the mechanism of the object requires lastUpdate to be initialized. Now, technically, we could just document that creators of the object must immediately call reportNewArtifact but it’s just easier for everyone to use a constructor.
(Just because Go allows external callers to initialize objects themselves, that doesn’t make it a good idea — it completely breaks the idea of encapsulation or abstraction.)
There was a problem hiding this comment.
I'm used to the doctrine that constructors should only initialize in Go, but that's completely subjective.
The lastUpdate is indeed problematic, good point. I'll stick to the constructor for that.
(Yes, it's a big limitation of "constructors" in Go that you cannot prevent struct literal initialization.)
| // progressReader is an io.Reader that reports its progress to | ||
| // an underlying *progressReporter. |
There was a problem hiding this comment.
The API is not shaped that way.
The comment seems to suggest that users should construct a progressReporter and then use it from a separate progressReader, but actually progressReader is a ~subclass and users are not expected to interact with the progressReporter as a separate instance.
In general, it’s surprising that the shape of integration between progressReader (a ~superclass) and blobChunkAccessorProxy (a property updated after initialization) is so different.
| // Setup progress reporting and report a new artifact event | ||
| // if the channel available and a non-zero interval set. |
There was a problem hiding this comment.
Already obvious in the code, please drop the comment.
|
|
||
| // Report completion for an artifact if possible. | ||
| if proxy.reporter != nil { | ||
| proxy.reporter.reportDone() |
There was a problem hiding this comment.
- Reaching into
proxywhich now “owns” the reporter this way is a bit inelegant (and reportedly an anti-pattern in OO philosophy, which I don’t care about so much). Who is responsible for doing this? If it is this function, it could just as well hold to thereporteritself… (see elsewhere about field vs. superclass vs…) - So when is
reportDoneexpected to be called? Always or only on success?copyBlobFromStreamdoes it unconditionally. [See what I mean about documenting methods at the relevant abstraction level, not “fires …”?] - I didn’t realize originally, and this complicates things: We need to do something reasonable on
ErrFallbackToOrdinaryLayerDownload. I suppose reporting the original access as done / aborted, and starting a new report with the sameprogressComponentIDbut starting with offset 0 would be plausible… or maybe we need a newProgressEventRestartingevent?! I suppose this needs looking at the Podman remote API to see whether that would like to receive some specific data.
| } | ||
| // Setup progress reporting and report a new artifact event | ||
| // if the channel available and a non-zero interval set. | ||
| if ic.c.options.Progress != nil && ic.c.options.ProgressInterval > 0 { |
There was a problem hiding this comment.
This is not immediately relevant (bar.ProxyReader breaks it anyway), but various code in the standard library special-cases known implementations of io.Reader and the like (e.g., but not only, checking for things like WriterTo); so adding proxy objects might have a non-obvious performance cost — it’s better to make the whole proxy usage conditional than to always interpose a proxy and then have it be a no-op under some conditions.
(This does not really apply to GetBlobAt because that’s our private interface.)
| artifact types.BlobInfo // The blob metadata which is currently being progressed | ||
| lastUpdate time.Time // The last time a progress channel event was sent | ||
| offset uint64 // The currently downloaded size in bytes | ||
| offsetUpdate uint64 // The number of bytes downloaded within the last update interval |
There was a problem hiding this comment.
| offsetUpdate uint64 // The number of bytes downloaded within the last update interval | |
| offsetUpdate uint64 // The number of bytes downloaded since lastUpdate |
then — it’s shorter and clearer.
| // | ||
| // blobChunkAccessorProxy.GetBlobAt also updates a *progressBar | ||
| // and *progressReporter (if non-nil) with the number of bytes read. | ||
| func (s *blobChunkAccessorProxy) GetBlobAt(ctx context.Context, info types.BlobInfo, chunks []private.ImageSourceChunk) (chan io.ReadCloser, chan error, error) { |
There was a problem hiding this comment.
Non-blocking: A unit test wouldn’t hurt.
There was a problem hiding this comment.
I added tests for the reporter itself but not for GetBlobAt as it interacts with the progressBar and wasn't sure it would add some value to test it separately.
If you think, they should exist, I'll, of course, add them.
There was a problem hiding this comment.
ACK, we don’t want/need to mock the MPB progress bars for this.
|
@mtrmac Thank you, I responded to some of the more obvious comments. I'll go through the rest later and change the code accordingly. |
The `PutBlobPartial` code path only updated a progress bar, but it did not report its progress to the `copy.Options.Progress` channel for chunked layers. Add missing reporting and refactor parts shared with `progressReader`. Fixes: containers#469 Signed-off-by: Marek Simek <msimek@redhat.com>
Signed-off-by: Marek Simek <msimek@redhat.com>
e67f898 to
80883b6
Compare
| // TestNewProgressReporter verifies that constructing a reporter | ||
| // signals a new artifact event. | ||
| func TestNewProgressReporter(t *testing.T) { | ||
| channel := make(chan types.ProgressProperties, 1) |
There was a problem hiding this comment.
The buffered channel is used for simplicity to avoid a race because the newProgressReporter constructor sends the event and then returns.
|
I hope changes in 80883b6 reflect your comments and what we discussed.
Edit: There was an omission in the previous commit. |
Signed-off-by: Marek Simek <msimek@redhat.com>
…layers Signed-off-by: Marek Simek <msimek@redhat.com>
| ) | ||
| defer progressReader.reportDone() | ||
| stream.reader = progressReader | ||
| // === Wrap stream with progress reporting if a reporter was provided. |
There was a problem hiding this comment.
(For these “pipeline steps” (“section headers”), documenting what happens is more important than how exactly, unless the “how exactly” is non-trivial.)
| @@ -145,11 +145,12 @@ func (bar *progressBar) mark100PercentComplete() { | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
I think it would still make sense to have progress_bars.go dealing with mpb and the specifics of the layout, vs. progress.go for the things shared between progress_bars.go and progress_channel.go (i.e. the blobChunkAccessorProxy).
| artifact: artifact, | ||
| lastUpdate: time.Now(), | ||
| offset: 0, | ||
| offsetUpdate: 0, |
There was a problem hiding this comment.
(Absolutely non-blocking: I think keeping explicit initializers here is a tiny bit better because we do rely on having these values exactly.)
| lastUpdate time.Time | ||
| offset uint64 | ||
| offsetUpdate uint64 | ||
| // progressReporter facilitates progress reporting through its |
There was a problem hiding this comment.
“facilitates” means nothing specific.
Maybe something vaguely like “reports progress about a single blob to a types.ProgressProperties channel [and supports re-starting from zero without reporting the negative progress through the channel]”
| // the processing has to be re-started | ||
| // (e.g. ErrFallbackToOrdinaryLayerDownload). | ||
| func (r *progressReporter) reset() { | ||
| r.offset = 0 |
There was a problem hiding this comment.
In this situation, we don’t want the GUI progress bars to decrease; just note that the current offset is 0 but remember that we reported N, and then report no progress until we reach N again.
| } | ||
|
|
||
| destInfo, err := ic.copyBlobFromStream(ctx, bytes.NewReader(configBlob), srcInfo, nil, true, false, bar, -1, false) | ||
| destInfo, err := ic.copyBlobFromStream(ctx, bytes.NewReader(configBlob), srcInfo, nil, true, false, bar, -1, false, nil) |
There was a problem hiding this comment.
Configs should still report progress, discussed elsewhere.
| } | ||
| } | ||
|
|
||
| // Only report completion on success. |
There was a problem hiding this comment.
(Discussed elsewhere: This could happen in copyBlobFromStream, as long as the function documents that. Alternatively, allow a no-op reporter where we can reporter.reportDone() without the condition.)
| } | ||
| if ic.c.options.Progress != nil && ic.c.options.ProgressInterval > 0 { | ||
| reporter = newProgressReporter(ic.c.options.Progress, ic.c.options.ProgressInterval, srcInfo) | ||
| proxy.reporter = reporter |
There was a problem hiding this comment.
I think reordering to create the reporter first and the proxy second would be a bit cleaner.
| // | ||
| // blobChunkAccessorProxy.GetBlobAt also updates a *progressBar | ||
| // and *progressReporter (if non-nil) with the number of bytes read. | ||
| func (s *blobChunkAccessorProxy) GetBlobAt(ctx context.Context, info types.BlobInfo, chunks []private.ImageSourceChunk) (chan io.ReadCloser, chan error, error) { |
There was a problem hiding this comment.
ACK, we don’t want/need to mock the MPB progress bars for this.
| assert.Equal(t, types.ProgressProperties{ | ||
| Event: types.ProgressEventNewArtifact, | ||
| Artifact: artifact, | ||
| }, <-channel, "constructor should send a new artifact event") |
There was a problem hiding this comment.
(I think it would help reliability of the tests to have a consumeNonblocking helper that returns all current []ProgressProperties; we could then also test that operations within the internal report nothing.)
The
PutBlobPartialcode path only updated a progress bar, but it did not report its progress to thecopy.Options.Progresschannel for chunked layers.This PR adds missing reporting and refactors parts shared with
progressReader.The Podman REST API
images/pullwith thepullProgressflag introduced in containers/podman#28224 did not get progress updates from the channel and the output stream lacked events for partial blobs.Test added in: containers/podman#28713
Fixes: #469
API pullProgress stream with this fix