gh-99730: HEAD should remain HEAD request on redirect #99731

haampie · 2022-11-23T17:00:52Z

Issue: HEAD requests should be HEAD requests upon redirect #99730

bedevere-bot · 2022-11-23T17:00:58Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

ghost · 2022-11-23T17:01:19Z

All commit authors signed the Contributor License Agreement.

bedevere-bot · 2022-11-23T17:12:19Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

For reasons beyond me Python thinks it's a great idea to upgrade HEAD requests to GET requests when following redirects. So, this PR adds a better `HTTPRedirectHandler`, and also moves some ad-hoc logic around for dealing with disabling SSL certs verification. Also, I'm stumped by the fact that Spack's `url_exists` does not use HEAD requests at all, so in certain cases Spack awkwardly downloads something first to see if it can download it, and then downloads it again because it knows it can download it. So, this PR ensures that both urllib and botocore use HEAD requests. Finally, it also removes some things that were there to support currently unsupported Python versions. Notice that the HTTP spec [section 10.3.2](https://datatracker.ietf.org/doc/html/rfc2616.html#section-10.3.2) just talks about how to deal with POST request on redirect (whether to follow or not): > If the 301 status code is received in response to a request other > than GET or HEAD, the user agent MUST NOT automatically redirect the > request unless it can be confirmed by the user, since this might > change the conditions under which the request was issued. > Note: When automatically redirecting a POST request after > receiving a 301 status code, some existing HTTP/1.0 user agents > will erroneously change it into a GET request. Python has a comment about this, they choose to go with the "erroneous change". But they then mess up the HEAD request while following the redirect, probably because they were too busy discussing how to deal with POST. See python/cpython#99731

Lib/test/test_urllib2.py

encukou

This makes sense to me.

haampie · 2024-03-04T15:59:05Z

Is there a way to test whether this is breaking in practice?

For example, I've noticed that a Gitlab instance (forgot which) had download URLs for release assets, that returned a redirect to an AWS bucket with a URL that contains a signature. What's signed is the request as a whole, including the HTTP verb. It would only work with GET, HEAD requests would error with 403.

So, following the redirect required a GET request.

Not sure whose bug that is.

encukou · 2024-03-06T13:49:20Z

It seems that the error is with Gitlab/AWS, but easily worked around by not doing HEAD requests to a service that doesn't support them.

@orsenthil, as the urllib expert, do you have an opinion?

orsenthil · 2024-03-09T01:01:37Z

do you have an opinion?
This change sounds reasonable and correct to me. I will find out why it was GET prior to this change and check the RFCs before approving and merging.

haampie · 2024-03-09T07:57:05Z

Looks like it's complicated.

The RFCs suggest "user interaction" in certain cases which is obviously impossible.

And there's different behavior for different types of 3xx status codes.

Looks like they realized that redirect on POST request is somewhat ambiguous: either it means the POST request has to be sent elsewhere, or, the POST request succeeded, you just have to GET the (success) response elsewhere. The 303 message resolves that ambiguity. I have no clue if any modern webserver is following the RFC correctly, they might still reply with 302 on success (?) -- the POST behavior in urllib makes sense to me.

From RFC 2616 10.3.2: 301 Moved Permanently

Note: When automatically redirecting a POST request after
receiving a 301 status code, some existing HTTP/1.0 user agents
will erroneously change it into a GET request.

From RFC 2616 10.3.3: 302 Found

Note: RFC 1945 and RFC 2068 specify that the client is not allowed
to change the method on the redirected request. However, most
existing user agent implementations treat 302 as if it were a 303
response, performing a GET on the Location field-value regardless
of the original request method. The status codes 303 and 307 have
been added for servers that wish to make unambiguously clear which
kind of reaction is expected of the client.

And RFC 2616 10.3.4: 303 See Other

The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource. This method
exists primarily to allow the output of a POST-activated script to
redirect the user agent to a selected resource.

And RFC 2626 10.3.8: 307 Temporary Redirect

If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.

encukou · 2024-03-11T09:01:21Z

IMO, while it's not clear whether the redirected request should be made, it's pretty clear that if it is made, HEAD should stay HEAD rather than turn into GET.
As I read it, the standard's wording on 303 See Other says what to do to retrieve the resource. But making a HEAD request means you don't want to retrieve the resource :)

The behaviour on 307 Temporary Redirect does not follow the standard, but, that's not what this PR is about.

encukou · 2024-03-25T15:12:41Z

@orsenthil,

This change sounds reasonable and correct to me. I will find out why it was GET prior to this change and check the RFCs before approving and merging.

Is there anything I can do to help with this?

encukou · 2024-04-25T14:26:53Z

I will find out why it was GET prior to this change and check the RFCs before approving and merging.

@orsenthil Do you think you can find the time before Python 3.13 beta1?

encukou · 2024-04-29T14:29:15Z

I plan to merge this around Wednesday if there are no objections.

orsenthil

LGTM

orsenthil · 2024-05-01T12:38:46Z

@encukou -

I apologize, I forgot to get back to this issue and discussion. I spent time reviewing the RFCs, although it surprises me that we have this behavior (of HEAD upgrading to GET) for a long time, I was not able to find a requirement to this behavior in the RFCs. Further, the clients like curl, which I often used as a reference, do no upgrade HEAD to GET. So this change is correct thing to do.

There is a possibility of some client relying on this archaic behavior, but they wont be adversely affected, and they could handle this a GET again.

Thanks for change, and pushing this through.

encukou · 2024-05-01T16:00:56Z

Thank you for checking!

…H-99731)

HEAD requests be HEAD requests upon redirect

13ac4a9

bedevere-bot mentioned this pull request Nov 23, 2022

HEAD requests should be HEAD requests upon redirect #99730

Closed

bedevere-bot added the awaiting review label Nov 23, 2022

test

8cd2bd7

📜🤖 Added by blurb_it.

122c8db

haampie changed the title ~~gh-99730: HEAD remain HEAD requests on redirect~~ gh-99730: HEAD should remain HEAD request on redirect Nov 23, 2022

haampie mentioned this pull request Nov 23, 2022

url_exists related improvements spack/spack#34095

Merged

bcail reviewed Oct 5, 2023

View reviewed changes

Lib/test/test_urllib2.py Outdated Show resolved Hide resolved

Merge branch 'main' into patch-1

a468244

encukou approved these changes Mar 4, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Mar 4, 2024

encukou added 2 commits April 29, 2024 16:28

Fix HTTP status description

a74f769

Merge branch 'main' into patch-1

90739de

orsenthil approved these changes May 1, 2024

View reviewed changes

encukou merged commit 759e8e7 into python:main May 1, 2024

bedevere-app bot removed the awaiting merge label May 1, 2024

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024

pythongh-99730: urllib.request: Keep HEAD method on redirect (pythonG…

1a1771b

…H-99731)

haampie deleted the patch-1 branch February 17, 2025 16:46

Uh oh!

gh-99730: HEAD should remain HEAD request on redirect #99731

gh-99730: HEAD should remain HEAD request on redirect #99731

Uh oh!

Conversation

haampie commented Nov 23, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-bot commented Nov 23, 2022

Uh oh!

ghost commented Nov 23, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-bot commented Nov 23, 2022

Uh oh!

Uh oh!

encukou left a comment

Choose a reason for hiding this comment

Uh oh!

haampie commented Mar 4, 2024

Uh oh!

encukou commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orsenthil commented Mar 9, 2024

Uh oh!

haampie commented Mar 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

encukou commented Mar 11, 2024

Uh oh!

encukou commented Mar 25, 2024

Uh oh!

encukou commented Apr 25, 2024

Uh oh!

encukou commented Apr 29, 2024

Uh oh!

orsenthil left a comment

Choose a reason for hiding this comment

Uh oh!

orsenthil commented May 1, 2024

Uh oh!

encukou commented May 1, 2024

Uh oh!

Uh oh!

haampie commented Nov 23, 2022 •

edited by bedevere-bot

Loading

ghost commented Nov 23, 2022 •

edited by ghost

Loading

encukou commented Mar 6, 2024 •

edited

Loading

haampie commented Mar 9, 2024 •

edited

Loading