[Webkit-unassigned] [Bug 51035] UCS2 encoding aliases should be defaulted to Big Endian

Tue Dec 14 12:16:58 PST 2010

https://bugs.webkit.org/show_bug.cgi?id=51035

--- Comment #3 from Yong Li <yong.li.webkit at gmail.com>  2010-12-14 12:16:58 PST ---
(In reply to comment #2)
> (From update of attachment 76540 [details])
> > 4) IE and FireFox treat "ISO-10646-UCS-2" as Big Endian.
> 
> Did you test other encoding names modified here? Do all the added sub-tests pass in IE and Firefox?
> 
> It's surprising that we fail on every page that's labeled as "UTF-16" and doesn't have a BOM. In fact, I recall that the current behavior is intentional, and was chosen for compatibility with other browsers - but I don't remember details of testing.
> 
> +// UCS2 encoding aliases.
> +testDecode('ISO-10646-UCS-2', '%00a', 'U+0061');
> +testDecode('UCS-2', '%00a', 'U+0061');
> +testDecode('UTF-16', '%00a', 'U+0061');
> +testDecode('Unicode', '%00a', 'U+0061');
> +testDecode('csUnicode', '%00a', 'U+0061');
> +testDecode('unicodeFFFE', '%00a', 'U+0061');
> +testDecode('unicodeFEFF', 'a%00', 'U+0061');
> 
> This patch has confused terminology. Why did you label UTF-16 as a "UCS2 encoding alias"? It's clearly not.
> 
> UCS-2 is a historic name for 16-bit encoding defined by Unicode 1.0, and obsoleted by Unicode 2.0. I suspect that most or all clients treat it as UTF-16 in practice. ICU certainly does.
> 
> +testDecode('ISO-10646-UCS-2', '%00a', 'U+0061');
> 
> Please add tests for something besides zero bytes. While at it, please consider adding tests for whether surrogate pairs are supported for each of these encoding names.
> 
> Extending test coverage would is very welcome.

(In reply to comment #2)
> (From update of attachment 76540 [details])
> > 4) IE and FireFox treat "ISO-10646-UCS-2" as Big Endian.
> 
> Did you test other encoding names modified here? Do all the added sub-tests pass in IE and Firefox?

FireFox seems treat "Unicode" as UTF-8. But for other names, it uses Big Endian. Except, when using 0 bytes, it also works with little endian for "UTF-16".

> 
> It's surprising that we fail on every page that's labeled as "UTF-16" and doesn't have a BOM. In fact, I recall that the current behavior is intentional, and was chosen for compatibility with other browsers - but I don't remember details of testing.
> 
> +// UCS2 encoding aliases.
> +testDecode('ISO-10646-UCS-2', '%00a', 'U+0061');
> +testDecode('UCS-2', '%00a', 'U+0061');
> +testDecode('UTF-16', '%00a', 'U+0061');
> +testDecode('Unicode', '%00a', 'U+0061');
> +testDecode('csUnicode', '%00a', 'U+0061');
> +testDecode('unicodeFFFE', '%00a', 'U+0061');
> +testDecode('unicodeFEFF', 'a%00', 'U+0061');
> 
> This patch has confused terminology. Why did you label UTF-16 as a "UCS2 encoding alias"? It's clearly not.
> 
> UCS-2 is a historic name for 16-bit encoding defined by Unicode 1.0, and obsoleted by Unicode 2.0. I suspect that most or all clients treat it as UTF-16 in practice. ICU certainly does.
>

Can I say they are aliases of UTF-16 then?

> +testDecode('ISO-10646-UCS-2', '%00a', 'U+0061');
> 
> Please add tests for something besides zero bytes. While at it, please consider adding tests for whether surrogate pairs are supported for each of these encoding names.
> 
Will use surrogate pairs.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.