Skip to content

Attempt at double decoding of PATH_INFO when using Django and CookieJar #274

@interDist

Description

@interDist

I was getting a UserWarning: http.cookiejar bug! for a Unicode URL and after 4 hours of debugging seem to found the culprit.

After Webtest calls req.get_response for a Django WSGI application, the environment changes and no longer contains the raw URL-unquoted path but its UTF8-decoded equivalent. However, when extracting the cookies, Webtest still assumes that the path has the raw value, resulting in the following stacktrace:

  File "/usr/lib/python3.12/http/cookiejar.py", line 1628, in make_cookies
    ns_cookies = self._cookies_from_attrs_set(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 1583, in _cookies_from_attrs_set
    cookie = self._cookie_from_cookie_tuple(tup, request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 1532, in _cookie_from_cookie_tuple
    req_host, erhn = eff_request_host(request)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 642, in eff_request_host
    erhn = req_host = request_host(request)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 627, in request_host
    url = request.get_full_url()
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webtest/utils.py", line 125, in get_full_url
    return self._request.url
           ^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 497, in url
    url = self.path_url
          ^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 469, in path_url
    bpath_info = bytes_(self.path_info, self.url_encoding)
                        ^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/descriptors.py", line 70, in fget
    return req.encget(key, encattr=encattr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 167, in encget
    return bytes_(val, 'latin-1').decode(encoding)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/compat.py", line 33, in bytes_
    return s.encode(encoding, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0109' in position 1: ordinal not in range(256)

The modification of the environment happens because of how Django’s WSGIRequest handles it. In particular, this line decodes the PATH_INFO:

        path_info = get_path_info(environ) or "/"

and these 2 lines modify the PATH_INFO in the environ:

        self.META = environ
        self.META["PATH_INFO"] = path_info

For example, performing a GET request of “/ab%C4%87%C4%8F%C4%99f” results in the following debug output:

inside get(), url='/ab%C4%87%C4%8F%C4%99f'
   req.url_encoding=UTF-8, req.environ['PATH_INFO']=/abÄÄÄf, req.path_info=/abćďęf
inside do_request(), PATH_INFO=/abÄÄÄf
before get response, PATH_INFO=/abÄÄÄf
   inside TestRequest.call_application, environ['PATH_INFO']=/abÄÄÄf
   ...
   inside TestRequest.call_application, environ['PATH_INFO']=/abćďęf
after get response, PATH_INFO=/abćďęf 

I don’t understand the internals sufficiently to say what the most correct course of action is:

  • should Webtest create a copy of the environment prior to calling req.get_response and restore it after the call?
  • should Webtest copy only PATH_INFO and restore it after getting the response?
  • should Webtest examine PATH_INFO after getting the response, and if it is Unicode, encode it back into a raw form?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions