[webkit-reviews] review granted: [Bug 45085] [WTFURL] Add a mechanism for classifying types of characters : [Attachment 66315] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed Oct 13 00:32:19 PDT 2010


Maciej Stachowiak <mjs at apple.com> has granted Adam Barth <abarth at webkit.org>'s
request for review:
Bug 45085: [WTFURL] Add a mechanism for classifying types of characters
https://bugs.webkit.org/show_bug.cgi?id=45085

Attachment 66315: Patch
https://bugs.webkit.org/attachment.cgi?id=66315&action=review

------- Additional Comments from Maciej Stachowiak <mjs at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=66315&action=review

r=me If the above comments are addressed.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.cpp:37
> +    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00 - 0x0f
> +    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x10 - 0x1f
> +    0, // 0x20  ' ' (escape spaces in queries)

Not sure about this, but maybe instead of 0 for no flags, there should be a
symbolic constant like InvalidCharacter.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:42
> +    // Bits that identify different character types. These types identify
different
> +    // bits that are set for each 8-bit character in the characterTypeTable.


This comment seems redundant. The enum and its values are fairly
self-documenting.

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:50
> +    enum CharTypes {
> +	 QueryCharacter = 1,
> +	 UserInfoCharacter = 2,
> +	 IPv4Character = 4,
> +	 HexCharacter = 8,
> +	 DecimalCharacter = 16,
> +	 OctalCharacter = 32,
> +    };

WebKit convention is usually to use a shifted 1 as the value for enums that are
meant to be used with masking (1 << 0, 1 << 1, 1 << 2, etc).

Might also want to look at how the similar character classes are documented in
KURL.cpp.

(On a personal note, I find it surprising that both URL parsers have a
character flags table but have completely different classes!)

> JavaScriptCore/wtf/url/src/URLCharacterTypes.h:58
> +    // This table contains the flags in CharTypes for each 8-bit character.
> +    // Some canonicalization functions have their own specialized lookup
table.
> +    // For those with simple requirements, we have collected the flags in
one
> +    // place so there are fewer lookup tables to load into the CPU cache.
> +    //
> +    // Using an unsigned char type has a small but measurable performance
benefit
> +    // over using a 32-bit number.

This comment seems overly verbose and the second part in particular is needless
detail, unless there's a reason to believe someone would accidentally change
the "unsigned char" to "unsigned". If some of these details do need to be
documented, then perhaps the comment should go in the implementation file,
since this is a private static data member.


More information about the webkit-reviews mailing list