Skip to content

Improve retrying downloads code and testing#1559

Merged
yarikoptic merged 20 commits intomasterfrom
gh-1556
Feb 12, 2025
Merged

Improve retrying downloads code and testing#1559
yarikoptic merged 20 commits intomasterfrom
gh-1556

Conversation

@yarikoptic
Copy link
Copy Markdown
Member

Follow up to

We are into holidays already, so it is understood if review does not come. I will reserve the "right" to merge if there is user demand but likely we could just wait. If anyone sees "clearly" how to improve -- feel welcome to push directly.

@yarikoptic yarikoptic added the patch Increment the patch version when merged label Dec 20, 2024
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 90.09009% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88.65%. Comparing base (746650f) to head (2482103).
Report is 112 commits behind head on master.

Files with missing lines Patch % Lines
dandi/dandiarchive.py 14.28% 6 Missing ⚠️
dandi/dandiapi.py 33.33% 2 Missing ⚠️
dandi/utils.py 92.30% 2 Missing ⚠️
dandi/download.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
+ Coverage   88.34%   88.65%   +0.30%     
==========================================
  Files          78       78              
  Lines       10735    10829      +94     
==========================================
+ Hits         9484     9600     +116     
+ Misses       1251     1229      -22     
Flag Coverage Δ
unittests 88.65% <90.09%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@candleindark candleindark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, _check_if_more_attempts_allowed is equally, if not more, about sleeping/waiting before sending out another requests. I think it would be beneficial to rename it, or in some other way, to indicate this nature.

If we want to improve the retrying further, I think we can consider incorporating some well-known and recommended algorithms for retrying requests to a server while avoiding overwhelming it. https://chatgpt.com/share/676f8290-86cc-800a-9aa8-1ad952282d27.

)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "Retry-After=[]" in this log is now insistent and sleep_amount may not be a meaningful number if retry_after failed to be parsed into a datetime object

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I do not get what [] you have in mind since it can't be a list... please rereview when I push changes -- may be the issue is addressed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant that you have numerous logs with inconsistent labelling with "Retry-After=". Some of them you set it to the header value of the "Retry-After" (understandable), others you set it to sleep_amount which can originate from, but not exactly, the passed in header value for "Retry-After" and can default to -1.0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if it is all consistent now -- seems to me, always seems to point to retry_after.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retry-After=[]s in log messages, where [] is used to indicate an expression of %d or %s, are consistent within the get_retry_after(), but this last one is still inconsistent with the ones in get_retry_after(). This last one prints the int value of sleep_amount, the result of get_retry_after(), and the others print whatever the value of the HTTP header Retry-After which can be a string of a specific date.

I think we can just remove the use of Retry-After= here. There is plenty of logged messages within get_retry_after()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After header.

We are better off if we just replace the entire if statement:

        if (sleep_amount := get_retry_after(exc.response)) is not None:
            lgr.debug(
                "%s - download failed due to response %d with "
                "Retry-After=%d: %s, will sleep and retry",
                path,
                exc.response.status_code,
                sleep_amount,
                exc,
            )

with

sleep_amount = get_retry_after(exc.response)

and unindent the next log statement as I recommended below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! done that, pushed

datetime.UTC  was introduced only in 3.11, so type checking which uses 3.9
ATM fails rightfully, as unittesting on 3.9 and 3.10
…ETRY_CODES are used

It also add additional checks/protection against odd retry-after results (too long negatives or positives),
which should address some concerns raised in prior code review
* origin/master:
  Update CHANGELOG.md [skip ci]
  Constrain version pin
  Further refine version pinning
  Pin `dandischema` to require the latest version
  Start Zarr download as soon as first page of entries is obtained
  Update CHANGELOG.md [skip ci]
  Use Python 3.10 to build docs
  Support dandischema v0.11.0
  Fix typo in `dandi move` docstring
  Don't use version 0.25.5 of `responses`
  Update URL for DANDI Docs
yarikoptic and others added 6 commits February 3, 2025 16:37
@yarikoptic
Copy link
Copy Markdown
Member Author

rerunning windows which puked with

FAILED dandi/tests/test_dandiarchive.py::test_parse_api_url_glob[https://gui.dandiarchive.org/#/dandiset/001001/draft/files?location=sub-RAT123/*.nwb-parsed_url1] - requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

@yarikoptic yarikoptic requested a review from jwodder February 5, 2025 01:22
dandi/utils.py Outdated
elif sleep_amount < 0:
sleep_amount = None
lgr.warning(
"date in Retry-After=%r is in the past (current is %r). "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to do "Retry-After=%s" instead of "Retry-After=%r" if you want the HTTP header appears exactly the same as they are received, without the quotes.

)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retry-After=[]s in log messages, where [] is used to indicate an expression of %d or %s, are consistent within the get_retry_after(), but this last one is still inconsistent with the ones in get_retry_after(). This last one prints the int value of sleep_amount, the result of get_retry_after(), and the others print whatever the value of the HTTP header Retry-After which can be a string of a specific date.

I think we can just remove the use of Retry-After= here. There is plenty of logged messages within get_retry_after()

Comment on lines +1137 to +1142
lgr.debug(
"%s - download failed on attempt #%d: %s, will sleep a bit and retry",
"%s - download failed on attempt #%d: %s, will sleep %f and retry",
path,
attempt,
exc,
sleep_amount,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just unindent these lines, and get the log statement out of the if-statement and remove the last log statement at line 1126. See the comments for the last log statement for more details.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense, dedenting the log so we always log how long we are to sleep, not only if random amount due to no Retry-After. I did remove that other log

)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After header.

We are better off if we just replace the entire if statement:

        if (sleep_amount := get_retry_after(exc.response)) is not None:
            lgr.debug(
                "%s - download failed due to response %d with "
                "Retry-After=%d: %s, will sleep and retry",
                path,
                exc.response.status_code,
                sleep_amount,
                exc,
            )

with

sleep_amount = get_retry_after(exc.response)

and unindent the next log statement as I recommended below.

… Retry-After= etc

Co-authored-by: Isaac To <candleindark@users.noreply.github.com>

Was amended locally later due to a dirty suggestion applied
@yarikoptic yarikoptic added the release Create a release when this pr is merged label Feb 12, 2025
@yarikoptic yarikoptic merged commit 527347c into master Feb 12, 2025
25 checks passed
@yarikoptic yarikoptic deleted the gh-1556 branch February 12, 2025 19:58
@github-actions
Copy link
Copy Markdown

🚀 PR was released in 0.66.7 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Increment the patch version when merged release Create a release when this pr is merged released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants