[webkit-reviews] review denied: [Bug 21635] Use UTF-8 for Unicode encodings other than UTF-8 when submitting a form or encoding the query part of a URL : [Attachment 24544] patch with layout tests

Fri Oct 24 11:59:36 PDT 2008

Darin Adler <darin at apple.com> has denied Jungshik Shin <jshin at chromium.org>'s
request for review:
Bug 21635: Use UTF-8 for Unicode encodings other than UTF-8 when submitting a
form or encoding the query part of a URL
https://bugs.webkit.org/show_bug.cgi?id=21635

Attachment 24544: patch with layout tests
https://bugs.webkit.org/attachment.cgi?id=24544&action=edit

------- Additional Comments from Darin Adler <darin at apple.com>
Looks great!

> +    // Treat UTF-16, UTF-32, UTF-7 as UTF-8
> +    init(base, relative, encoding.encodingForFormSubmission());

I think this comment is unclear. The comment really means "using the
encodingForFormSubmission function will ..." and that's clear right now when
we're making the change but someone coming later to look at this function won't
necessarily understand that. Could you write a comment that says that a little
more specifically?

It seems unfortunate that the URL code is calling encodingForFormSubmission. In
what way is URL encoding the same thing as form submission? Should there be a
differently named synonym here instead? Or maybe the comment can explain that
encoding of queries is essentially a "part of form submission"?

> -static void addEncodingName(HashSet<const char*>& set, const char* name)
> +namespace {
> +
> +void addEncodingName(HashSet<const char*>& set, const char* name)

Throughout the WebKit project, we've been using static to get internal linkage
instead of using anonymous namespaces, with very few exceptions. I understand
that you can accomplish the same thing with an anonymous namespace, but is
there a reason to start using that technique here? I'd prefer to make the
change only if there's a good reason to do so. Using a mix of the two is
probably less clear that consistently using one or the other.

The anonymous namespace technique doesn't work as well with some debuggers, for
example, so there needs to be an advantage to offset that disadvantage.

> +bool TextEncoding::isNonByteBasedEncoding() const
> +{
> +    return *this == UTF16LittleEndianEncoding() ||
> +	      *this == UTF16BigEndianEncoding() ||
> +	      *this == UTF32BigEndianEncoding() ||
> +	      *this == UTF32LittleEndianEncoding();
> +}

Our coding style is to put such operators at the beginnings of lines rather
than the ends of lines.

For this new function you've used the term "ByteBased", and for the existing
function we used the term "8Bit". I'd like to see the functions use the same
terminology. So we could rename the old one or the new one.

> +// TODO(jungshik) : Do we have to care about BOCU, SCSU, and CESU-8?
> +// Those encodings along with UTF-{7,8,16,32} come with 'ICU'.

This is not the format we use to leave comments behind. We always use FIXME and
we don't attach our names in that format. Also, I think this should be clearer.
Try to write a comment someone can notice later.

> +// HTML5
> +// specifies that UA should not support UTF-7, BOCU, SCSU and CESU-8. 
> +// To block them, closest8BitEquivalent (called in TextResourceLoader)
> +// should be changed. 

I don't think this comment is in the right place. It's not about
encodingForFormSubmission, is it? I don't understand what "blocking" these
character sets would entail, but it seems that we wouldn't want to decode them
at all.

I'm going to say review- but these comments are really minor issues.