[webkit-reviews] review requested: [Bug 71331] Towards 8 Bit Strings: Templatize JSC::Lexer class by character type : [Attachment 113408] Updated patch

Wed Nov 2 17:44:55 PDT 2011

Michael Saboff <msaboff at apple.com> has asked  for review:
Bug 71331: Towards 8 Bit Strings: Templatize JSC::Lexer class by character type
https://bugs.webkit.org/show_bug.cgi?id=71331

Attachment 113408: Updated patch
https://bugs.webkit.org/attachment.cgi?id=113408&action=review

------- Additional Comments from Michael Saboff <msaboff at apple.com>
(In reply to comment #8)

Most of the suggestions were taken.  The ones not taken have comments below. 
Fixed two build issues.  Still concerned that some compilers may not like
comparison in template function between and unsigned char and 0xff.

> (From update of attachment 113262 [details])
> View in context:
https://bugs.webkit.org/attachment.cgi?id=113262&action=review
> 
> r- because it doesn't build. 
> 
> I think you can get a lot more speed and simplicity out of this patch. See
below.
> 
> > Source/JavaScriptCore/parser/Lexer.cpp:289
> > +	     data = reinterpret_cast<const T
*>(source.provider()->data()->characters8());
> 
> Shouldn't <T> always be LChar or UChar? If so, no need for cast here.

This is needed because this is a templates function that is compiled for both
LChar and UChar types.	One of these cast will be needed for each
instantiation.

> > Source/JavaScriptCore/parser/Lexer.cpp:292
> > +	     data = reinterpret_cast<const T
*>(source.provider()->data()->characters16());
> 
> Shouldn't <T> always be LChar or UChar? If so, no need for cast here.

See previous comment.

> > Source/JavaScriptCore/parser/Lexer.cpp:485
> > +	 m_buffer16.append(static_cast<UChar>(c));
> 
> No need for a cast between LChar and UChar.

This is needed since c is type "int".

> > Source/JavaScriptCore/parser/Lexer.cpp:492
> > +	 // FIXME: Change record16 and m_buffer16 to record8 and m_buffer8
below when
> > +	 // 8 bit strings are turned on.
> 
> Better to just not record at all in the fast case. See below.

We need to record in the case that the buffer type and template type (LChar and
UChar) differ.	The code has been updated accordingly.

> > Source/JavaScriptCore/parser/Lexer.cpp:504
> > +	 bool bufferRequired = false;
>
> bufferRequired is always false, and can be removed.

See previous comment.

> > Source/JavaScriptCore/parser/Lexer.cpp:516
> > +	     m_buffer16.resize(0); // FIXME: Change to m_buffer8
> 
> No need to resize to 0, since we haven't appended anything.

Added an if (bufferRequired) above this.

> > Source/JavaScriptCore/parser/Lexer.cpp:529
> > +		     append16(identifierStart, currentCharacter() -
identifierStart);  // FIXME: Change to append8
> > +		 ident = makeIdentifier(m_buffer16.data(), m_buffer16.size());
// FIXME: Change to m_buffer8
> 
> Why append to one buffer just to copy into another? I think you should just
call makeIdentifier, passing identifierStart and currentCharacter() -
identifierStart, with no intervening append16. Then you can also remove
clearing m_buffer16 from the fast case.

Again, we want to buffer when the source string type is different from the type
we want to create (eventually 8 bit).

> > Source/JavaScriptCore/parser/Lexer.cpp:540
> > +	     // Keywords must not be recognized if there was an \uXXXX in the
identifier.
> 
> You've already ruled out a Unicode escape above (which is good, because
people don't typically name their variables "MyV\u0061riable"). You can remove
all of this code.

The comment doesn't describe the check.  The check is for an identifier that
matches a keyword and is needed.

> > Source/JavaScriptCore/parser/Lexer.cpp:558
> > +template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType
Lexer<T>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexType,
bool strictMode)
> 
> Please don't mark the slow case inline.
> 
> It seems like you can remove some code from the slow case, which already ran
in the fast case. You can at least remove the fast check for a keyword at the
head of the slow case.

Removed ALWAYS_INLINE and cleaned up the code a little.

> > Source/JavaScriptCore/parser/Lexer.cpp:646
> > +	     if (UNLIKELY((m_current == '\\'))) {
> 
> Since you've decided that this case is unlikely, can it move out into a
helper function? That way, the codegen for the likely stuff will improve, and
this code can be shared with the slow case.

Efectively the fast case makes (or will make) 8 bit strings while the slow case
makes 16 bit strings.  Therefore we need to handle the escapes besides \uxxxx.

> > Source/JavaScriptCore/parser/Lexer.cpp:690
> > +	     // Fast check for characters that require special handling.
> > +	     // Catches -1, \n, \r, 0x2028, and 0x2029 as efficiently
> > +	     // as possible, and lets through all common ASCII characters.
> > +	     if (UNLIKELY(((static_cast<unsigned>(m_current) - 0xE) & 0x2000)))
{
> 
> You can remove this case entirely if you just change the test above from
"m_current > 0xff" to "m_current > 0xff || m_current < 0xe". (Unless you think
we need to handle the 'bell' character on the fast path ;) ).

Done.