[webkit-reviews] review granted: [Bug 93742] HTML Parser should produce 8bit substrings for inline style and script elements : [Attachment 158867] Patch updated with suggested changes including using PACKUSWB

Thu Aug 16 12:39:45 PDT 2012

Benjamin Poulain <benjamin at webkit.org> has granted Michael Saboff
<msaboff at apple.com>'s request for review:
Bug 93742: HTML Parser should produce 8bit substrings for inline style and
script elements
https://bugs.webkit.org/show_bug.cgi?id=93742

Attachment 158867: Patch updated with suggested changes including using
PACKUSWB
https://bugs.webkit.org/attachment.cgi?id=158867&action=review

------- Additional Comments from Benjamin Poulain <benjamin at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=158867&action=review

I have a few comments but everything seems good to me.

I wonder if this patch will also help with the cascading
StringImp::getData16SlowCase() we get later in the execution.

> Source/WTF/wtf/Alignment.h:60
> -
> +    

Whilespaces.

> Source/WTF/wtf/text/ASCIIFastPath.h:25
> +#if OS(DARWIN) 

Because of iOS, I think you also need "&& (CPU(X86) || CPU(X86_64))".

> Source/WTF/wtf/text/ASCIIFastPath.h:104
> +#if OS(DARWIN)

Ditto for iOS.

> Source/WTF/wtf/text/ASCIIFastPath.h:108
> +    unsigned i = 0;

This should probably be size_t since length is also size_t.

> Source/WTF/wtf/text/ASCIIFastPath.h:131
> +    for (; i < length; ++i) {
> +	   ASSERT(!(source[i] & 0xff00));
> +	   destination[i] = source[i];
> +    }

If you want this assertion, you should also have it in the prefix code.

Maybe you should just add an ALWAYS_INLINE function doing the loop, and use it
in the prefix, postfix and !OS(darwin) version.

> Source/WTF/wtf/text/WTFString.h:104
> +template<bool isSpecialCharacter(UChar), typename CharType>
> +bool isAllSpecialCharacters(const CharType*, size_t);

More and more we tend to use CharacterType instead of CharType.

> Source/WTF/wtf/text/WTFString.h:573
> -template<bool isSpecialCharacter(UChar)> inline bool
isAllSpecialCharacters(const UChar* characters, size_t length)
> +template<bool isSpecialCharacter(UChar), typename CharType>
> +inline bool isAllSpecialCharacters(const CharType* characters, size_t
length)

CharType -> CharacterType would be nice.

> Source/WebCore/ChangeLog:13
> +	   Currently all data associated with a token is stored and processed
as UChars.
> +	   Added code to determine that the contents of token data is all 8 bit
by keeping
> +	   the logical OR value of all prior characters.  Also added a flag
that the parser
> +	   can set to indicate when the token data is converted to a String
that we want
> +	   to make an 8 bit string if possible.  Enabled this handling for
script, style,
> +	   iframe, noembed, noframes, noscript and xmp tags.

Double space after each period.

> Source/WebCore/xml/parser/MarkupTokenBase.h:307
> +    void setConvertTo8Bit()

setConvertTo8Bit() -> setConvertTo8BitIfPossible()?

The method name setConvertTo8Bit() seems to imply the token is always gonna be
converted to 8bits.