-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-99730: HEAD should remain HEAD request on redirect #99731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
For reasons beyond me Python thinks it's a great idea to upgrade HEAD requests to GET requests when following redirects. So, this PR adds a better `HTTPRedirectHandler`, and also moves some ad-hoc logic around for dealing with disabling SSL certs verification. Also, I'm stumped by the fact that Spack's `url_exists` does not use HEAD requests at all, so in certain cases Spack awkwardly downloads something first to see if it can download it, and then downloads it again because it knows it can download it. So, this PR ensures that both urllib and botocore use HEAD requests. Finally, it also removes some things that were there to support currently unsupported Python versions. Notice that the HTTP spec [section 10.3.2](https://datatracker.ietf.org/doc/html/rfc2616.html#section-10.3.2) just talks about how to deal with POST request on redirect (whether to follow or not): > If the 301 status code is received in response to a request other > than GET or HEAD, the user agent MUST NOT automatically redirect the > request unless it can be confirmed by the user, since this might > change the conditions under which the request was issued. > Note: When automatically redirecting a POST request after > receiving a 301 status code, some existing HTTP/1.0 user agents > will erroneously change it into a GET request. Python has a comment about this, they choose to go with the "erroneous change". But they then mess up the HEAD request while following the redirect, probably because they were too busy discussing how to deal with POST. See python/cpython#99731
For reasons beyond me Python thinks it's a great idea to upgrade HEAD requests to GET requests when following redirects. So, this PR adds a better `HTTPRedirectHandler`, and also moves some ad-hoc logic around for dealing with disabling SSL certs verification. Also, I'm stumped by the fact that Spack's `url_exists` does not use HEAD requests at all, so in certain cases Spack awkwardly downloads something first to see if it can download it, and then downloads it again because it knows it can download it. So, this PR ensures that both urllib and botocore use HEAD requests. Finally, it also removes some things that were there to support currently unsupported Python versions. Notice that the HTTP spec [section 10.3.2](https://datatracker.ietf.org/doc/html/rfc2616.html#section-10.3.2) just talks about how to deal with POST request on redirect (whether to follow or not): > If the 301 status code is received in response to a request other > than GET or HEAD, the user agent MUST NOT automatically redirect the > request unless it can be confirmed by the user, since this might > change the conditions under which the request was issued. > Note: When automatically redirecting a POST request after > receiving a 301 status code, some existing HTTP/1.0 user agents > will erroneously change it into a GET request. Python has a comment about this, they choose to go with the "erroneous change". But they then mess up the HEAD request while following the redirect, probably because they were too busy discussing how to deal with POST. See python/cpython#99731
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me.
Is there a way to test whether this is breaking in practice? For example, I've noticed that a Gitlab instance (forgot which) had download URLs for release assets, that returned a redirect to an AWS bucket with a URL that contains a signature. What's signed is the request as a whole, including the HTTP verb. It would only work with GET, HEAD requests would error with 403. So, following the redirect required a GET request. Not sure whose bug that is. |
It seems that the error is with Gitlab/AWS, but easily worked around by not doing HEAD requests to a service that doesn't support them. @orsenthil, as the urllib expert, do you have an opinion? |
|
Looks like it's complicated. The RFCs suggest "user interaction" in certain cases which is obviously impossible. And there's different behavior for different types of 3xx status codes. Looks like they realized that redirect on POST request is somewhat ambiguous: either it means the POST request has to be sent elsewhere, or, the POST request succeeded, you just have to GET the (success) response elsewhere. The 303 message resolves that ambiguity. I have no clue if any modern webserver is following the RFC correctly, they might still reply with 302 on success (?) -- the POST behavior in urllib makes sense to me. From RFC 2616 10.3.2: 301 Moved Permanently
From RFC 2616 10.3.3: 302 Found
And RFC 2616 10.3.4: 303 See Other
And RFC 2626 10.3.8: 307 Temporary Redirect
|
IMO, while it's not clear whether the redirected request should be made, it's pretty clear that if it is made, HEAD should stay HEAD rather than turn into GET. The behaviour on 307 Temporary Redirect does not follow the standard, but, that's not what this PR is about. |
Is there anything I can do to help with this? |
@orsenthil Do you think you can find the time before Python 3.13 beta1? |
I plan to merge this around Wednesday if there are no objections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@encukou - I apologize, I forgot to get back to this issue and discussion. I spent time reviewing the RFCs, although it surprises me that we have this behavior (of HEAD upgrading to GET) for a long time, I was not able to find a requirement to this behavior in the RFCs. Further, the clients like curl, which I often used as a reference, do no upgrade HEAD to GET. So this change is correct thing to do. There is a possibility of some client relying on this archaic behavior, but they wont be adversely affected, and they could handle this a GET again. Thanks for change, and pushing this through. |
Thank you for checking! |
Uh oh!
There was an error while loading. Please reload this page.