[webkit-reviews] review requested: [Bug 16538] KURL should use String instead of DeprecatedString : [Attachment 18947] New patch addressing most comments

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Feb 5 20:01:37 PST 2008


Marvin Decker <marv.decker at gmail.com> has asked  for review:
Bug 16538: KURL should use String instead of DeprecatedString
http://bugs.webkit.org/show_bug.cgi?id=16538

Attachment 18947: New patch addressing most comments
http://bugs.webkit.org/attachment.cgi?id=18947&action=edit

------- Additional Comments from Marvin Decker <marv.decker at gmail.com>
MacDome's review:

I think all additions of String( are probably bogus.  String does not have any
explicit constructors (I don't think).

You are correct, I removed these.

> you could update them to match our style guidelines and remove the { }. 
That's
> not required for this patch, but it's good to know for the future at least.

I fixed a bunch of these, although I don't pretend to have fixed all cases.

> All * and & should be next to the argument names.  Per our style guidelines.

...as well as these when I noticed them.

I fixed your other suggestions.

-----------
Darin's review:

> The mix of signed and unsigned integers is starting to get a little awkward.

It's unfortunate that String uses both unsigned and int to refer to string
indices. With this design, it is impossible for any code using strings to be
very consistent. I changed some things when I got unsigneds out of string to
use int. I have changed some of these back and used static cast on
string.length instead. The code is a little more consistent now.

> encodeHostnames and substituteBackslashes are examples of this.

I replaced these functions with the versions from the patch in the middle I
gave up on. These worked well, and were not the cause of the problems I
reported (the problems were in KURL::parse).

The rest of the functions still use DeprecatedString when concatenating, with
the exception of substituteBackslashes. This function isn't very commonly used,
it only does one string concat, and to replace, I would have to write my own
version of replace() which seemed like a waste of time to me. I can change this
too if you really want.

> -    ASSERT(!pageURLOriginal.isNull());
>
> Was this assertion not actually guaranteed? If so, then I understand the bug
> fix. If not, then I'm puzzled by why you made the change here.

That's correct. The code that calls it is closed source, but I think it gets
the URL from some frame or loader related class, and blindly passes it to this
function. Because old KURL would lose the NULLness of strings, it would be an
empty but non-NULL URL in some cases when now it is NULL when it was
constructed as a NULL KURL (not sure exactly what cases those are).

Was this assertion not actually guaranteed? If so, then I understand the bug
fix. If not, then I'm puzzled by why you made the change here.

> +static bool isAllASCII(const String& str)
> ...
> This function already exists in RenderText.cpp, although it works on a
> StringImpl rather than a String. I would prefer that we unify and share a
> single one rather than adding another. The RenderText.cpp one uses an
algorithm
> that's considerably faster for long strings where characters actually are all

> ASCII, and since all-ASCII is the normal case I'd prefer to do it that way.

FIXME FIGURE THIS OUT

> +	   dest[i] = static_cast<char>(src[i]);
>
> There's no need for an explicit cast here. I don't know of any compiler
WebKit
> supports that gives a warning for narrowing conversions like this one. And
> while it would be nice to be explicit that there's a type conversion going
on,
> I think the general guideline to avoid casts when possible is probably
slightly
> more important.

You are mistaken, Visual C++ gives:
warning C4244: '=' : conversion from 'unsigned short' to 'char', possible loss
of data.
I think it's very important when you are losing data to know that you meant to
lose it when you read the code, even if the compiler issued no warning.

> +unsigned findFirstOf(const String& s, unsigned startPos, const char* toFind)

>
> I would prefer that if you need something like this that you add it to the
> String class rather than put in a local function inside KURL. Or if you
really
> think it should be local, it should be marked static so it gets internal
> linkage.

I forgot the static here, I have added it. This is used on Vectors as well, so
I can't add it to the String class.

I think a general problem here with this function, isAllASCII, and
substituteBackslashes is that you are requiring that String not be used for
this type of manipulation (which I understand), while WebKit lacks the
infrastructure to do anything beyond stdlib outside of the String/StringImpl
class. I think there are four options: move all StringImpl's guts out to a file
that takes lengths and offsets, make StringImpl able to use a buffer it doesn't
own, have a nice place to put this type of function outside of String[Impl], or
accumulate a whole lot of custom-written string manipulation functions in each
file.

Fixing this is decidedly outside the scope of this patch, but should be
considered soon.

> +	   // Note that we don't pass anything as the second argument. When the

> +	   // innards of KURL are converted from DeprecatedString to String,
this
> +	   // should be put back. It is an optimization that prevents
allocating
> +	   // another string in the common case that nothing changes.
>
> Are you sure it's OK to lose this optimization? I'm pretty sure it's a real
> issue. We added this optimization for a reason, and I don't think we should
> take it out lightly.

I think this is a nice optimization. See my previous comment about why I don't
think this is practical in this patch. Plus, since probably half of the callers
of the string class have a DeprecatedString to start out with, you already lost
the optimization for those callers. The cost of not doing this is only the cost
of doing a String<->DeprecatedString conversion, which is also done all over.

I think an even better way of improving this would be to reduce the number of
KURL->String->KURL operations. A large number of the cases where the string
doesn't change and this optimization could apply are a result of this sequence
of events. Making more things use KURL instead of String would make it even
faster than this optimization. I have filed a few bugs on this and am hoping to
do some work in this area once this patch is in.

After talking to MacDome, I think fixing bug 17098 would be a faster and,
surprisingly, a less risky way of fixing this optimization. I would consider my
KURL-to-use-String-internally patch pretty risky without any tests.


> +    for (i = 0; i < schemeEndPos; i++) {
> +	   if (!proto[i] || toASCIILower(urlString[i].unicode()) != proto[i])
> +	       return false;
> +    }
> +    return proto[i] == 0;  // We should have consumed all characters in the
> argument.
>
> Why use ! inside the loop and == 0 outside the loop? I'd prefer that you be
> consistent. I think we should add an assert here that the passed-in protocol
> does not contain any ASCII uppercase characters or non-ASCII characters. I
also
> think that the name "proto" isn't so great, since it could easily mean
> "prototype" instead. Why not use the entire word protocol?

I fixed the name and 0 comparisons, and added the assertion you requested.

I fixed the grammar and spelling errors, and placement of '&' in function
prototypes.

> +	   m_frame->loader()->begin(KURL("http://localhost/placeholder.svg"));
//
> create the empty document

> Why is http a good idea here? Can we come up with something better? I think
it
> would be bad if something decided to do I/O and talk to the local HTTP
server!
> That could well be a valid URL to a different SVG.

Do you think KURL() would be OK here? I changed it to this.

---------
The main thing I didn't address is where isAllASCII should live. I would like
some specific guidance on this given that I don't always have a String to call
it on.


More information about the webkit-reviews mailing list