[webkit-reviews] review denied: [Bug 56099] Introduce WTF HexNumber class : [Attachment 85587] Patch v5

Sat Mar 12 06:18:21 PST 2011

Darin Adler <darin at apple.com> has denied Nikolas Zimmermann
<zimmermann at kde.org>'s request for review:
Bug 56099: Introduce WTF HexNumber class
https://bugs.webkit.org/show_bug.cgi?id=56099

Attachment 85587: Patch v5
https://bugs.webkit.org/attachment.cgi?id=85587&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=85587&action=review

Looks like some good changes, but this is too much in a single patch. I suggest
one patch about using more efficient idioms for building strings that do not
involve hex, and a separate patch focusing on hex.

One of the main ideas here seems to be “StringBuilder is better than Vector”,
but I’m not sure it always is. I’d like to see a patch that adopts makeString
more. Another patch that gets rid of intermediate strings when using
String::number (especially valuable with makeString), and then perhaps a patch
that moves from Vector to StringBuilder in the places where we build up strings
in a loop, but that Vector -> StringBuilder one will require more thought.

> Source/JavaScriptCore/wtf/HexNumber.h:27
> +class HexNumber {

This seems like a namespace, not a class.

I’m not sure the hexadecimal number processing functions need a namespace or
class. If the functions themselves had good enough names then they could be at
the top level in WTF.

> Source/JavaScriptCore/wtf/HexNumber.h:29
> +    enum ConversionMode {

Making this enum a member of the class HexNumber does avoid polluting the
global namespace, but at a cost of having to say HexNumber::Uppercase every
time. I’d prefer an enum at namespace scope if we can give it a clear enough
name.

> Source/JavaScriptCore/wtf/HexNumber.h:30
> +	   LowerCase = 1 << 0,

We normally use the word “lowercase”, not the two words “lower case”.

> Source/JavaScriptCore/wtf/HexNumber.h:31
> +	   UpperCase = 1 << 1,

We normally use the word “uppercase”, not the two words “lower case”.

> Source/JavaScriptCore/wtf/HexNumber.h:32
> +	   Compressed = 1 << 2 // eg. #000 (used in the CSSParser)

I think the Compressed option is confusing. It seems to just mean “single
digit”. There has to be some cleaner way to expose that. Maybe just an entirely
separate function for a single digit. I don’t think that having the logic for
compressed mode here (use 1 digit if the number fits) makes a lot of sense.
It’s a pretty idiosyncratic requirement.

> Source/JavaScriptCore/wtf/HexNumber.h:35
> +    static String charToString(unsigned char number, unsigned mode =
LowerCase)

This seems like an inefficient function that would rarely be useful.

A string like this could almost never be used alone, and allocating two memory
blocks just to make the string seems like major overhead if we are just going
to concatenate it to something else. What call sites actually need this? Can we
find another way to do this that’s more efficient? I’d like to see this tied in
with the efficient mechanisms for building strings rather than providing a
function that constructs a single string.

If we want this to fit into the makeString string concatenation mechanism, we
want to come up with a way to fit it in without allocating a temporary string.
We can probably do that by coming up with a struct that contains a small fixed
size array of UChar with a length and making sure makeString can handle that.

I don’t think that “char” is a good name for a single byte. We’re stuck with it
in the C++ language, but in our function names I would prefer the word “byte”
for cases where “char” does not mean “character”, and the word “character” for
cases where it does.

Just personal bias, but I prefer uppercase as a default. Depends on clients in
the code I suspect.

I’m a little surprised that the type of the argument is unsigned rather than
ConversionMode.

I don’t think it’s a good idea to make this an inline function. In fact, the
header would read a lot clearer if we separated function declarations from
definitions, even if some of the definitions need to be inline. That goes for
templates too.

> Source/JavaScriptCore/wtf/HexNumber.h:44
> +	   Vector<UChar, 2> buffer;
> +	   buffer.resize(2);
> +	   buffer[0] = hexDigits[number >> 4];
> +	   buffer[1] = hexDigits[number & 0xF];
> +	   return String::adopt(buffer);

Using a vector here is not a good idea. Instead it should be more like this:

    UChar digits[2] = { hexDigits[number >> 4], hexDigits[number & 0xF] };
    return String(digits, 2);

But as above, I question the need for this function.

> Source/JavaScriptCore/wtf/HexNumber.h:48
> +    template<typename T, size_t inlineCapacity>
> +    static void appendCharToVector(unsigned char number, Vector<T,
inlineCapacity>& destination, unsigned mode = LowerCase)

I think it would be better to just call this appendByteAsHex, have it not be a
class member and not make it vector-specific. The only thing this relies on is
an append function.

    template<typename T> void appendByteAsHex(T& destination, unsigned char
byte, ConversionMode);

> Source/JavaScriptCore/wtf/HexNumber.h:61
> +    static String unsignedToString(unsigned number, int stopAfterDigits =
-1, unsigned mode = LowerCase)

Same comments as above about functions returning strings. The use of -1 to mean
don’t stop seems awkward. I think we should instead have a separate function
without a minimum number of digits.

stopAfterDigits seems the wrong name for this if it’s trying to replicate
String::format. The String::format argument is minimumDigits or something like
that.

We could make something that does a fixed number of digits.

I think the comment that this can replace String::format is a little strange.

Should assert that the fixed number of digits is a reasonable one.

> Source/JavaScriptCore/wtf/HexNumber.h:83
> +	   String hexString = unsignedToString(number, stopAfterDigits, mode);

We definitely don’t want to make a string first. We should make the append
function and then we could build a string version out of that.

> Source/JavaScriptCore/wtf/HexNumber.h:96
> +	   static const char s_lowerHexDigits[17] = "0123456789abcdef";
> +	   static const char s_upperHexDigits[17] = "0123456789ABCDEF";

These global variables are local to the function, and so don’t need to have any
prefix at all. They are not data members, and so “s_” doesn’t apply even though
they keyword, “static”, is the same.

> Source/JavaScriptCore/wtf/text/WTFString.h:-462
> -inline void appendNumber(Vector<UChar>& vector, unsigned char number)

Why did you delete this? It’s not about hex, so deleting it does not seem
directly related to your patch.

> Source/WebCore/css/CSSOMUtils.cpp:59
> +    String hexString = HexNumber::unsignedToString(c);
> +    appendTo.append('\\');
> +    appendTo.append(hexString.characters(), hexString.length());
> +    appendTo.append(' ');

This should be using the function for appending to vectors, not the function
for making strings.

> Source/WebCore/css/CSSParser.cpp:6403
> +	       Vector<char, 2> destination;
> +	       HexNumber::appendCharToVector<char, 2>(ch, destination,
HexNumber::LowerCase | HexNumber::Compressed);

We need to do this without allocating memory. Your new code is much slower than
the old code because it allocates a temporary vector which it then throws away.

Probably the best way is to figure out a way to hook the append template up to
the buffer[index++] operation. If we had an object with an append function that
did buffer[index++] that would do the trick.

> Source/WebCore/css/CSSPrimitiveValue.cpp:638
> -	       text = formatNumber(m_value.num) + "%";
> +	       text = makeString(formatNumber(m_value.num), "%");

This is good--much better to use makeString rather than string
concatenation--but not directly related to your new hex work. I think you
should land this separately. If we change the return value of formatNumber so
we don’t have to allocate a string we can make this much more efficient, but
again not part of the hex patch.

> Source/WebCore/css/CSSPrimitiveValue.cpp:730
> +	       StringBuilder builder;
> +	       builder.reserveCapacity(32);

This would be much more efficient with either a Vector<UChar> or a single call
to makeString. Your change to use StringBuilder is probably making it a bit
less efficient.

StringBuilder is not a very good class unless you happen to already have
everything in String objects; otherwise we do memory allocations for each
string while building the string and then throw away all the temporary strings
after the fact.

Vector is a little because it uses only a single memory block to build the
vector, possibly reallocating it a few times, then a second memory block the
string at the end.

makeString is even better because it just allocates the entire string all at
once and can take advantage of the optimization where we put the StringImpl and
characters into a single block. We can easily use makeString here by doing the
hasAlpha work completely separately. But to really make performance good we
will also want a function that’s like String::number but that works without
allocating a new string, as I suggested for hex.

But again, this improvement is not related to hex so should probably should not
be mixed in with the hex changes.

> Source/WebCore/css/CSSPrimitiveValue.cpp:752
> +	       StringBuilder builder;

Better to minimize the number of intermediate strings a bit more. Make use
makeString with more arguments at a time. I guess StringBuilder is OK for this.
Not sure how it compares to Vector if we do everything else as efficiently as
possible.

> Source/WebCore/dom/DatasetDOMStringMap.cpp:55
> -    Vector<UChar> newStringBuffer;
> +    StringBuilder stringBuilder;

As written this is extremely inefficient. It’s not using the buffer in
StringBuilder and so it’s actually allocating a new String for *every
character*. So please don’t make this change unless you think it through a bit
more. As is I believe it actually makes performance of this function
pathologically bad.

> Source/WebCore/editing/HTMLInterchange.cpp:61
> -    Vector<UChar> s;
> +    StringBuilder s;

Not sure StringBuilder actually ends up giving us better performance in this
function. We’re probably better off with the vector, but I am not sure. This
probably needs some testing. Since the whole point is to make it more
efficient, I suggest performance testing of some sort.

> Source/WebCore/inspector/InspectorResourceAgent.cpp:467
>      ASSERT(length > 0);
> -    static const char hexDigits[17] = "0123456789ABCDEF";
> -    size_t bufferSize = length * 3 - 1;
> -    StringBuffer buffer(bufferSize);
> -    size_t index = 0;
> +    Vector<UChar> buffer;
> +    buffer.reserveInitialCapacity(length * 3 - 1);
>      for (size_t i = 0; i < length; ++i) {
>	   if (i > 0)
> -	       buffer[index++] = ':';
> -	   buffer[index++] = hexDigits[value[i] >> 4];
> -	   buffer[index++] = hexDigits[value[i] & 0xF];
> +	       buffer.append(':');
> +	   HexNumber::appendCharToVector<UChar>(value[i], buffer,
HexNumber::UpperCase);
>      }
> -    ASSERT(index == bufferSize);
>      return String::adopt(buffer);

I think this new code results in an additional buffer compared to the old,
giving us a two-buffer string instead of an “all in one buffer” string. Not
sure.

> Source/WebCore/loader/DocumentThreadableLoader.cpp:135
> -	   Vector<UChar> headerBuffer;
> +	   StringBuilder headerBuilder;

This might be better, but the improvement seems marginal. One of the premises
of your patch seems to be that StringBuilder is almost always better than using
a vector. But I am not sure that’s the case.

It would be best if the patches were much clearer. One way is to make one patch
about simple examples where we know the performance tradeoff is absolutely
correct and easy to review. Then we could do a different patch about more
complex subtle cases.

> Source/WebCore/page/SecurityOrigin.cpp:400
> -    Vector<UChar> result;
> -    result.reserveInitialCapacity(m_protocol.length() + m_host.length() +
10);
> -    append(result, m_protocol);
> -    append(result, "://");
> -    append(result, m_host);
> -
> -    if (m_port) {
> -	   append(result, ":");
> -	   append(result, String::number(m_port));
> -    }
> +    StringBuilder result;
> +    result.reserveCapacity(m_protocol.length() + m_host.length() + 10);
> +    result.append(makeString(m_protocol, "://", m_host));
> +
> +    if (m_port)
> +	   result.append(makeString(":", String::number(m_port)));
>  
> -    return String::adopt(result);
> +    return result.toString();

makeString would be better than either StringBuilder or Vector here. Not sure
StringBuilder is better than Vector but this is actually a small fixed string.

> Source/WebCore/platform/FileSystem.cpp:93
> -	       *p++ = hexDigits[(c >> 4) & 0xF];
> -	       *p++ = hexDigits[c & 0xF];
> +	       HexNumber::appendCharToVector<UChar, 512>(c, buffer,
HexNumber::UpperCase);

I think you broke this code. This is writing with *p++ and you replaced that
with buffer.append which will not do the same thing. Needs testing.

> Source/WebCore/platform/KURL.cpp:963
> +    Vector<char, 2> destination;
> +    HexNumber::appendCharToVector<char, 2>(c, destination,
HexNumber::UpperCase);
> +    ASSERT(destination.size() == 2);
>      *buffer++ = '%';
> -    *buffer++ = hexDigits[c >> 4];
> -    *buffer++ = hexDigits[c & 0xF];
> +    *buffer++ = destination[0];
> +    *buffer++ = destination[1];

Using a vector here seems weak and awkward. We need to make sure the hex code
can handle these *buffer++ cases; a simple way would be to have an adapter with
an append function, but another way is to simply offer the *p++ idiom as a
function.

> Source/WebCore/platform/UUID.cpp:115
>      StringBuilder builder;
> -    builder.append(String::format("%08x", randomData[0]));
> +    builder.append(HexNumber::unsignedToString(randomData[0], 8));
>      builder.append("-");
> -    builder.append(String::format("%04x", randomData[1] >> 16));
> +    builder.append(HexNumber::unsignedToString(randomData[1] >> 16, 4));
>      builder.append("-4");
> -    builder.append(String::format("%03x", randomData[1] & 0x00000fff));
> +    builder.append(HexNumber::unsignedToString(randomData[1] & 0x00000fff,
3));
>      builder.append("-");
> -    builder.append(String::format("%x", (randomData[2] >> 30) | 0x8)); //
Condense this byte to 8, 9, a, and b.
> -    builder.append(String::format("%03x", (randomData[2] >> 16) &
0x00000fff));
> +    builder.append(HexNumber::unsignedToString((randomData[2] >> 30) | 0x8,
1));
> +    builder.append(HexNumber::unsignedToString((randomData[2] >> 16) &
0x00000fff, 3));
>      builder.append("-");
> -    builder.append(String::format("%04x", randomData[2] & 0x0000ffff));
> -    builder.append(String::format("%08x", randomData[3]));
> +    builder.append(HexNumber::unsignedToString(randomData[2] & 0x0000ffff,
4));
> +    builder.append(HexNumber::unsignedToString(randomData[3], 8));
> +    builder.append("\n");

This should use a giant makeString, and not a builder. And is also a case that
would benefit a lot from a way to do hex numbers without intermediate String
objects.

> Source/WebCore/platform/graphics/Color.cpp:191
> +    StringBuilder builder;
> +    builder.reserveCapacity(28);
> +    builder.append(makeString("rgba(", String::number(red()), ", ",
String::number(green()), ", ", String::number(blue()), ", "));

Again, makeString is great, StringBuilder, not so great.