[Webkit-unassigned] [Bug 41845] refactor identifier parsing in lexer

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Jul 13 10:41:31 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=41845


Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #61361|review?                     |review-
               Flag|                            |




--- Comment #5 from Darin Adler <darin at apple.com>  2010-07-13 10:41:31 PST ---
(From update of attachment 61361)
Looks good.

> +    // The first free types are fixed, and also used for
> +    // identifying ascii alpha and alphanumeric characters

I think you mean the first *three* types. This comment should have a period, and ASCII should be in all capitals.

>  // 128 ascii codes

ASCII.

> -static unsigned short AsciiCharacters[128] = {
> +static unsigned short asciiCharacters[128] = {

Can this be static const?

The name of this array is not great. It's called asciiCharacters, but it's not an array of ASCII characters. It's actually an array of the character type of each ASCII character. So its name should be asciiCharacterTypes or typesOfASCIICharacters or something like that.

Also, the CharacterTypes enum has a strange name. A value of this enum type is a single character type, so the type name should be CharacterType. Just as "int" is not named "ints".

>  static inline bool isIdentStart(int c)
>  {
> -    return isASCII(c) ? isASCIIAlpha(c) || c == '$' || c == '_' : isNonASCIIIdentStart(c);
> +    return isASCII(c) ? asciiCharacters[c] == CharacterAlpha : isNonASCIIIdentStart(c);
>  }

CharacterAlpha seems like the wrong name for something that includes "$" and "_". Is this a term from the ECMAScript specification? If not, then I suggest that we don't call this CharacterAlpha, and instead call it CharacterIdentifierStart or something along those lines.

>  static inline bool isIdentPart(int c)
>  {
> -    return isASCII(c) ? isASCIIAlphanumeric(c) || c == '$' || c == '_' : isNonASCIIIdentPart(c);
> +    return isASCII(c) ? asciiCharacters[c] <= CharacterNumber : isNonASCIIIdentPart(c);
>  }

Needs a comment to explain why "<=" is correct here.

> +    // temporary contains an ascii representation of the code
> +    // helps to avoid code duplication

It's not clear to me what "temporary" is. Maybe you mean the local variable named "asciiCharacter". Comment should be sentence style. "ascii" should be "ASCII".

I don't think this comment says anything more clearly than the code. Saying "avoid code duplication" is too vague to add much clarity. I suggest omitting this comment entirely.

> +        if (isNonASCIIIdentStart(m_current))
> +            asciiCharacter = 'a';

This code, however, does need a comment. The comment needs to say why it's OK to set the local variable to the letter "a". Comments are most valuable when they explain "why" rather than "what" -- the code itself makes it clear "what". In this case, we set asciiCharacter to any legal identifier starting character so we can get the right result from the code below, but the actual character is not used for anything else.

I don't think the code needs to work this confusing way. Instead the local variable could just be a CharacterTypes value.

    CharacterType type;
    if (LIKELY(isASCII(m_current))
        type = typesOfASCIICharacters[m_current];
    else {
        if (isNonASCIIIdentStart(m_current))
            type = CharacterIdentifierStart;
        else
            type = CharacterLineTerminator;
    }
    switch (type) {


That code seems clear enough that you don't even need comments for it, and no less efficient than the code we have in the patch.

Can you separate the indentation change of the giant switch statement from the rest of the patch somehow? The diff is hard to read because of that.

I'm going to say review- because I think this change is unnecessarily oblique and it's easy to make it better. You could also consider some of my naming suggestions to make the code easier to understand.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list