[Webkit-unassigned] [Bug 87351] [BlackBerry] Cookie and Location header should be converted to latin-1/utf-8 in the same place.

Fri May 25 15:56:24 PDT 2012

https://bugs.webkit.org/show_bug.cgi?id=87351

--- Comment #10 from Adam Barth <abarth at webkit.org>  2012-05-25 15:55:28 PST ---
> ASCII is a subset of UTF-8, so I don't see the difference between processing it as ASCII and then using UTF-8 to decode bytes which are not valid ASCII, and just decoding as UTF-8.

Those two operations are different.  For example, consider a sequence of octets (like a BOM) in UTF-8 that, when decoded, doesn't produce any Unicode characters.  If you first decode the header using UTF-8 and then attempt to parse it, you can get the wrong answer because those sequence of octets will have disappeared.

For this reason, it's not possible to correctly process HTTP headers, be they the Cookie, Set-Cookie, or otherwise, in Unicode.  You need to process them as sequences of octets in order to get the correct behavior.

The design of handleNotifyHeaderReceived is broken and cannot be fixed without changing its type:

void NetworkJob::handleNotifyHeaderReceived(const String& key, const String& value)

Specifically, the key and the value need to be changed from Unicode strings to sequences of octets.  I'm repeating myself, but it is not possible to correctly process HTTP header in Unicode.

> All it says in RFC6265 is:
> 
>    NOTE: Despite its name, the cookie-string is actually a sequence of
>    octets, not a sequence of characters.  To convert the cookie-string
>    (or components thereof) into a sequence of characters (e.g., for
>    presentation to the user), the user agent might wish to try using the
>    UTF-8 character encoding [RFC3629] to decode the octet sequence.
>    This decoding might fail, however, because not every sequence of
>    octets is valid UTF-8.

Yes, I know what it says because I wrote it.

> Which implies that we can decode the whole header as UTF-8.

No, that's not what it says.  It says explicitly that cookie-string is actually a sequence of octets, not a sequence of characters.  If a user agent wishes to display the cookie-string to the user (e.g., using a font who's glyphs represent Unicode codepoints), then the user agent can try using UTF-8.  However, nothing in that note says that it's possible to meet the request of the requirements in the spec by processing the cookie-string in Unicode.  It doesn't say that because it's not possible.

> This contradicts the BNF, in fact, which defines cookie-octet to only allow ASCII, but some sites do send UTF-8 and others send Latin-1, so we have to deal.

Correct.  Not all servers send Set-Cookie headers that comply with the BNF.  That's why the RFC defines the precise handling of all sequences of octets that might be sent by servers.

> It's possible that some of the components of a Set-Cookie header, like the domain, should cause the cookie to be rejected if they're not plain ASCII, but we're not doing this check for Set-Cookie (yet?)

The design of this code is broken.  The only way to correctly process HTTP header is as sequences of octets.  Any attempt to process them in Unicode will not be correct.  Period.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.