Problem
When downloading a backup, downloadTableMetadata() retries indefinitely on missing metadata files. If a table was dropped or renamed during upload, its .json metadata file may not exist on remote storage, but the retry loop keeps trying with exponential backoff — wasting minutes before ultimately failing the entire download.
Current behavior
// pkg/backup/download.go — downloadTableMetadata()
retry := retrier.New(retrier.ExponentialBackoff(b.cfg.General.RetriesOnFailure, ...), b)
err := retry.RunCtx(ctx, func(ctx context.Context) error {
tmReader, err := b.dst.GetFileReader(ctx, remoteMetadataFile)
if err != nil {
return errors.Wrapf(err, "can't GetFileReader(%s) error", remoteMetadataFile)
}
// ...
})
When GetFileReader returns "object doesn't exist" / "NoSuchKey" / "StatusCode 404", this error is transient-looking to the retrier, so it retries RetriesOnFailure times with exponential backoff. But a 404 is permanent — the object will never appear.
Proposed Fix
Detect permanent "not found" errors and break out of the retry loop immediately:
retry := retrier.New(retrier.ExponentialBackoff(b.cfg.General.RetriesOnFailure, ...), b)
err := retry.RunCtx(ctx, func(ctx context.Context) error {
tmReader, err := b.dst.GetFileReader(ctx, remoteMetadataFile)
if err != nil {
// "object doesn't exist" is permanent — flag it and stop retrying
if strings.Contains(err.Error(), "doesn't exist") ||
strings.Contains(err.Error(), "key not found") ||
strings.Contains(err.Error(), "NoSuchKey") ||
strings.Contains(err.Error(), "StatusCode 404") {
notFoundErr = true
return nil // break out of retry loop
}
return errors.Wrapf(err, "can't GetFileReader(%s) error", remoteMetadataFile)
}
// ...
})
// After retry loop:
if notFoundErr {
log.Warn().Str("remoteMetadataFile", remoteMetadataFile).
Msg("metadata file not found on remote, skipping table")
continue
}
When this happens
- Table dropped during backup upload (backup started, table dropped, metadata never uploaded)
- Incremental backup where base backup's metadata was cleaned up
- Partial upload failure where some tables' metadata was never written
- Table renamed between backup create and upload
Impact
Without this fix, a single missing metadata file causes the entire download to hang for RetriesOnFailure × exponential_backoff duration (typically 5-10 minutes) before failing. With this fix, the missing table is skipped in milliseconds and the rest of the backup downloads successfully.
Problem
When downloading a backup,
downloadTableMetadata()retries indefinitely on missing metadata files. If a table was dropped or renamed during upload, its.jsonmetadata file may not exist on remote storage, but the retry loop keeps trying with exponential backoff — wasting minutes before ultimately failing the entire download.Current behavior
When
GetFileReaderreturns "object doesn't exist" / "NoSuchKey" / "StatusCode 404", this error is transient-looking to the retrier, so it retriesRetriesOnFailuretimes with exponential backoff. But a 404 is permanent — the object will never appear.Proposed Fix
Detect permanent "not found" errors and break out of the retry loop immediately:
When this happens
Impact
Without this fix, a single missing metadata file causes the entire download to hang for
RetriesOnFailure × exponential_backoffduration (typically 5-10 minutes) before failing. With this fix, the missing table is skipped in milliseconds and the rest of the backup downloads successfully.