[webkit-reviews] review denied: [Bug 71331] Towards 8 Bit Strings: Templatize JSC::Lexer class by character type : [Attachment 113262] Proposed Patch

Wed Nov 2 09:28:42 PDT 2011

Geoffrey Garen <ggaren at apple.com> has denied Michael Saboff
<msaboff at apple.com>'s request for review:
Bug 71331: Towards 8 Bit Strings: Templatize JSC::Lexer class by character type
https://bugs.webkit.org/show_bug.cgi?id=71331

Attachment 113262: Proposed Patch
https://bugs.webkit.org/attachment.cgi?id=113262&action=review

------- Additional Comments from Geoffrey Garen <ggaren at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=113262&action=review

r- because it doesn't build. 

I think you can get a lot more speed and simplicity out of this patch. See
below.

> Source/JavaScriptCore/parser/Lexer.cpp:288
> +	   ASSERT(sizeof(T) == sizeof(char));

Should this be sizeof(LChar)?

> Source/JavaScriptCore/parser/Lexer.cpp:289
> +	   data = reinterpret_cast<const T
*>(source.provider()->data()->characters8());

No space before *, please.

Shouldn't <T> always be LChar or UChar? If so, no need for cast here.

> Source/JavaScriptCore/parser/Lexer.cpp:292
> +	   data = reinterpret_cast<const T
*>(source.provider()->data()->characters16());

No space before *, please.

Shouldn't <T> always be LChar or UChar? If so, no need for cast here.

> Source/JavaScriptCore/parser/Lexer.cpp:477
> +    m_buffer16.append(static_cast<UChar>(c));

No need for a cast between LChar and UChar.

> Source/JavaScriptCore/parser/Lexer.cpp:485
> +    m_buffer16.append(static_cast<UChar>(c));

No need for a cast between LChar and UChar.

> Source/JavaScriptCore/parser/Lexer.cpp:492
> +    // FIXME: Change record16 and m_buffer16 to record8 and m_buffer8 below
when
> +    // 8 bit strings are turned on.

Better to just not record at all in the fast case. See below.

> Source/JavaScriptCore/parser/Lexer.cpp:493
> +    int startingOffset = currentOffset();

startingOffset is redundant with identifierStart, and can be removed.

> Source/JavaScriptCore/parser/Lexer.cpp:494
> +    int startingLineNumber = lineNumber();

You can wait to record startingLineNumber until after the keyword case, since
the keyword case does not advance the line number unless it succeeds.

> Source/JavaScriptCore/parser/Lexer.cpp:504
> +    bool bufferRequired = false;

bufferRequired is always false, and can be removed.

> Source/JavaScriptCore/parser/Lexer.cpp:508
> +    while (true) {
> +	   if (LIKELY(isIdentPart(m_current))) {
> +	       shift();

I think this would read more naturally as:

while (isIdentPart(m_current))
    shift();

if (m_current == '\\') {
    ... // Reset everything and return parseIdentifierSlowCase.
}

> Source/JavaScriptCore/parser/Lexer.cpp:516
> +	   m_buffer16.resize(0); // FIXME: Change to m_buffer8

No need to resize to 0, since we haven't appended anything.

> Source/JavaScriptCore/parser/Lexer.cpp:523
> +	   if (!bufferRequired) {

bufferRequired is always false, so you can remove this case.

> Source/JavaScriptCore/parser/Lexer.cpp:529
> +		   append16(identifierStart, currentCharacter() -
identifierStart);  // FIXME: Change to append8
> +	       ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); //
FIXME: Change to m_buffer8

Why append to one buffer just to copy into another? I think you should just
call makeIdentifier, passing identifierStart and currentCharacter() -
identifierStart, with no intervening append16. Then you can also remove
clearing m_buffer16 from the fast case.

> Source/JavaScriptCore/parser/Lexer.cpp:540
> +	   // Keywords must not be recognized if there was an \uXXXX in the
identifier.

You've already ruled out a Unicode escape above (which is good, because people
don't typically name their variables "MyV\u0061riable"). You can remove all of
this code.

> Source/JavaScriptCore/parser/Lexer.cpp:558
> +template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType
Lexer<T>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexType,
bool strictMode)

Please don't mark the slow case inline.

It seems like you can remove some code from the slow case, which already ran in
the fast case. You can at least remove the fast check for a keyword at the head
of the slow case.

> Source/JavaScriptCore/parser/Lexer.cpp:646
> +	   if (UNLIKELY((m_current == '\\'))) {

Since you've decided that this case is unlikely, can it move out into a helper
function? That way, the codegen for the likely stuff will improve, and this
code can be shared with the slow case.

> Source/JavaScriptCore/parser/Lexer.cpp:690
> +	   // Fast check for characters that require special handling.
> +	   // Catches -1, \n, \r, 0x2028, and 0x2029 as efficiently
> +	   // as possible, and lets through all common ASCII characters.
> +	   if (UNLIKELY(((static_cast<unsigned>(m_current) - 0xE) & 0x2000))) {

You can remove this case entirely if you just change the test above from
"m_current > 0xff" to "m_current > 0xff || m_current < 0xe". (Unless you think
we need to handle the 'bell' character on the fast path ;) ).

> Source/JavaScriptCore/parser/Lexer.h:66
> +// FIXME: LexType and ShiftType were moved outside of the Lexer definition
due
> +// to a clang compiler bug (<rdar://problem/10194295>) that doesn't properly

> +// handle fully qualified enums defined within a template class.

I think this issue is self-documenting: If anybody tries to move this enum
inside the Lexer, it won't compile.

> Source/JavaScriptCore/parser/ParserArena.h:69
> +    ALWAYS_INLINE const Identifier&
IdentifierArena::makeIdentifier(JSGlobalData* globalData, const LChar*
characters, size_t length)

Since this is just a copy of the existing makeIdentifier, I think it would be
better to templatize the existing makeIdentifier, with T* as the type for
"characters".

> Source/JavaScriptCore/runtime/Identifier.cpp:130
> +struct IdentifierLCharBufferTranslator {

Since these two structs are just a copy of UCharBuffer and
IdentifierUCharBufferTranslator, I think you should just templatize those, with
T* as the type for "s".

> Source/JavaScriptCore/runtime/Identifier.cpp:223
> +PassRefPtr<StringImpl> Identifier::add(JSGlobalData* globalData, const
LChar* s, int length)

Since this is just a copy of Identifier::add for const UChar*, I think you
should just templatize that function as well, with T* as the type for "s".