[webkit-reviews] review denied: [Bug 16538] KURL should use String instead of DeprecatedString : [Attachment 18785] Patch

Wed Jan 30 06:06:52 PST 2008

Darin Adler <darin at apple.com> has denied Marvin Decker
<marv.decker at gmail.com>'s request for review:
Bug 16538: KURL should use String instead of DeprecatedString
http://bugs.webkit.org/show_bug.cgi?id=16538

Attachment 18785: Patch
http://bugs.webkit.org/attachment.cgi?id=18785&action=edit

------- Additional Comments from Darin Adler <darin at apple.com>
This patch looks great! I particularly like protocolIs().

> but it is challenging because the parsing code relies on the utf-16/utf-8
duality of the string

I'm not sure what you mean here. DeprecatedString does not have UTF-16/UTF-8
duality. It does have a feature where you can just call ascii() on it and see
the characters as an 8-bit string but:

   1) it doesn't actually do anything really efficient -- it pretty much just
allocates a new 8-bit buffer
   2) it chops characters and does not do a UTF-16 to UTF-8 conversion
   3) there's very little code that actually takes advantage of this

Would you be willing to take another pass at that aspect of the patch? I ask
because I think it would be better to not land this with the comments "this
optimization is broken for now". I really don't think we need to stage this so
that there's a stage with the broken optimization.

String is inefficient if you do thinks like concatenate; any modifications to
string length require reallocating the buffer. The same is not true about
DeprecatedString. So I'm concerned about replacing functions that build up a
DeprecatedString by appending with functions that do the same thing with
String. This is the most challenging issue to be considered when replacing
DeprecatedString with String.

encodeHostnames and substituteBackslashes are examples of this. To do this
efficiently with String you need to build up the new string as a Vector<UChar>
and then use adopt() to turn it into a string once done modifying it. There
many not be the appropriate helper functions yet for String and Vector<UChar>,
and we've even considered adding a new StringBuilder class.

This is one of the reasons I personally haven't converted these functions yet
yet. Just converting to String and hoping that performance is acceptable may
not produce acceptable results. Please consider Vector<UChar> for functions
that build strings by appending. It's pretty straightforward, and anything
inconvenient can probably be papered over with a few helper functions.

The mix of signed and unsigned integers is starting to get a little awkward.
Since our existing string functions mostly use int, I'd prefer that you
continue that to avoid ugly casts in new code. Or we could move in the other
direction and try to move away from using int for string character indices. I'm
not pleased with the mix.

When checking for the 0 character this new code has a mix of styles. Sometimes
it's "!c", other times "c == 0", other times "c == '\0'". I'd prefer more
consistency.

-    ASSERT(!pageURLOriginal.isNull());

Was this assertion not actually guaranteed? If so, then I understand the bug
fix. If not, then I'm puzzled by why you made the change here.

+static bool isAllASCII(const String& str)
+{
+    for (unsigned i = 0; i < str.length(); i++) {
+	 if (str[i] > 0x7f)
+	     return false;
+    }
+    return true;
+}

This function already exists in RenderText.cpp, although it works on a
StringImpl rather than a String. I would prefer that we unify and share a
single one rather than adding another. The RenderText.cpp one uses an algorithm
that's considerably faster for long strings where characters actually are all
ASCII, and since all-ASCII is the normal case I'd prefer to do it that way.

+// ASCII. The detination buffer must be large enough, and it is not NULL
+// terminated unless the source has a NULL.

Typo. Missing the "s" in destination.

Normally NULL refers the pointer value and not the 0 character. The 0 character
is usually called NUL, with one L. Or you can say "null-terminated" and "null
character" and not SHOUT about the null character. There are many other places
you call this character NULL -- those should all be fixed.

+	 dest[i] = static_cast<char>(src[i]);

There's no need for an explicit cast here. I don't know of any compiler WebKit
supports that gives a warning for narrowing conversions like this one. And
while it would be nice to be explicit that there's a type conversion going on,
I think the general guideline to avoid casts when possible is probably slightly
more important.

+// Returns the index of the first index in string |s| of any of the characters

+// in |toFind|. |toFind| should be a NULL-terminated string, all characters up

+// to the NULL will be searched. Returns UINT_MAX if not found.
+unsigned findFirstOf(const String& s, unsigned startPos, const char* toFind)

I would prefer that if you need something like this that you add it to the
String class rather than put in a local function inside KURL. Or if you really
think it should be local, it should be marked static so it gets internal
linkage.

If you went to add it to the String class you'd see the mismatch, where you're
using unsigned and the other functions use int. It would be best if we were
consistent about this. If I was starting a new class I'd probably use unsigned,
but I'd prefer to not have the single new function go in a different direction
from all the existing ones. We could change them all in the future if you think
unsigned is better for this sort of thing.

+	 // Note that we don't pass anything as the second argument. When the
+	 // innards of KURL are converted from DeprecatedString to String, this

+	 // should be put back. It is an optimization that prevents allocating
+	 // another string in the common case that nothing changes.

Are you sure it's OK to lose this optimization? I'm pretty sure it's a real
issue. We added this optimization for a reason, and I don't think we should
take it out lightly.

+    for (i = 0; i < schemeEndPos; i++) {
+	 if (!proto[i] || toASCIILower(urlString[i].unicode()) != proto[i])
+	     return false;
+    }
+    return proto[i] == 0;  // We should have consumed all characters in the
argument.

Why use ! inside the loop and == 0 outside the loop? I'd prefer that you be
consistent. I think we should add an assert here that the passed-in protocol
does not contain any ASCII uppercase characters or non-ASCII characters. I also
think that the name "proto" isn't so great, since it could easily mean
"prototype" instead. Why not use the entire word protocol?

-    return protocol() == "file";
+    return protocol().lower() == "file";

This should be using protocolIs().

+    // DANGER! Thos does not do things like IDN, and the resulting host name

Typo "this" not "thos".

+    // Returns true if this URL hsa a path. Note that "http://foo.com/" has a

Typo "has" not "hsa".

+    // Returns true of the current URL's protocol is the same as the NULL-

Typo "if" not "of".

+    void setProtocol(const String &);
+    void setHost(const String &);

There should not be spaces before these "&" symbols. In general it would be
good if you fixed the positions of * and & characters even in existing code you
touch. Our WebKit coding style is Type* and Type& rather than Type * and Type
&.

For the functions where you're adding comments before the function declaration,
please also add blank lines to group the comments with the functions they are
describing. I find this hard to read:

    void f();
    // Pass a full path.
    void g(const char*).
    void h();

I'd prefer that there be a blank line after f and before h in a case like this.

+	 m_frame->loader()->begin(KURL("http://localhost/placeholder.svg")); //
create the empty document

Why is http a good idea here? Can we come up with something better? I think it
would be bad if something decided to do I/O and talk to the local HTTP server!
That could well be a valid URL to a different SVG.

There are enough comments here that I'm going to say review- for now, but this
seems like great work. Thanks for taking this on.