[webkit-reviews] review granted: [Bug 233276] [WGSL] Implement enough of the lexer for the simplest shaders : [Attachment 445349] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Nov 29 23:06:08 PST 2021


Myles C. Maxfield <mmaxfield at apple.com> has granted Robin Morisset
<rmorisset at apple.com>'s request for review:
Bug 233276: [WGSL] Implement enough of the lexer for the simplest shaders
https://bugs.webkit.org/show_bug.cgi?id=233276

Attachment 445349: Patch

https://bugs.webkit.org/attachment.cgi?id=445349&action=review




--- Comment #5 from Myles C. Maxfield <mmaxfield at apple.com> ---
Comment on attachment 445349
  --> https://bugs.webkit.org/attachment.cgi?id=445349
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=445349&action=review

> Source/WebGPU/WGSL/AST/ASTNode.h:30
> +namespace WGSL { namespace AST {

namespace WGSL::AST {

> Source/WebGPU/WGSL/AST/ASTNode.h:39
> +// FIXME: all of these should probably be marked FAST_MALLOC or whatever it
is to use bmalloc/libpas

I did something similar in https://trac.webkit.org/changeset/286224/webkit

> Source/WebGPU/WGSL/AST/Attribute.h:32
> +class Attribute : public ASTNode {};

Nit: Can classes have the } on a new line?

Why are attributes represented as AST nodes? They're not recursive. Why not
treat them as data?

> Source/WebGPU/WGSL/AST/Attribute.h:36
> +class AttributeGroup : public Attribute {

Nit: Usually descriptions go before the thing they're describing, so this
should probably be GroupAttribute (and ditto for below).

> Source/WebGPU/WGSL/AST/Attribute.h:38
> +    unsigned group() const { return m_value; }

Nit: usually we have an empty line before private: (or protected:)

> Source/WebGPU/WGSL/AST/Attribute.h:52
> +    enum Stage : uint8_t {

enum class?

> Source/WebGPU/WGSL/AST/Expression.h:32
> +class Expression : public ASTNode {};

This file is going to get quite big. Are you sure you want to have all nodes
defined in a single file?

> Source/WebGPU/WGSL/AST/GlobalDecl.h:50
> +    // Each of the following may be null

Nit: I don't think this comment is particularly helpful, because we have a
class UniqueRef for non-nullable unique ptrs, so the type says that it can be
null.

The comment on the next line, though, is helpful.

> Source/WebGPU/WGSL/AST/GlobalDecl.h:63
> +    StringView m_name;

It's cool that you're using StringViews and not Strings.

> Source/WebGPU/WGSL/AST/GlobalDecl.h:66
> +class StructDecl : public GlobalDecl {

Nit: Blank line between classes, please.

> Source/WebGPU/WGSL/AST/GlobalDecl.h:85
> +};

Ditto.

> Source/WebGPU/WGSL/AST/GlobalDecl.h:92
> +    Type* maybeReturnType() { return m_returnType.get(); }

Because this returns a pointer, I don't see much value in calling it "maybe".
If it wasn't "maybe," it would return a reference.

> Source/WebGPU/WGSL/AST/Statement.h:34
> +class CompoundStatement : public Statement {

Why not "Block"?

> Source/WebGPU/WGSL/AST/Statement.h:41
> +class ReturnStatement : public Statement {

You're probably going to want the SPECIALIZE_TYPE_TRAITS stuff and virtual
functions to be able to downcast these.

> Source/WebGPU/WGSL/AST/Statement.h:43
> +    Expression* maybeExpression() { return m_expression.get(); }

Ditto about "maybe"

> Source/WebGPU/WGSL/AST/Statement.h:50
> +    Expression* maybeLhs() { return m_lhs.get(); }

... and other places too

> Source/WebGPU/WGSL/AST/Statement.h:51
> +    Expression* rhs() { return m_rhs.get(); }

This should probably return a &

> Source/WebGPU/WGSL/AST/Statement.h:55
> +    std::unique_ptr<Expression> m_rhs;

This should probably be a UniqueRef

> Source/WebGPU/WGSL/AST/Type.h:44
> +    enum Kind {

Woah. I guess that's cool, since we don't have real templates/generics (yet).

> Source/WebGPU/WGSL/AST/Type.h:68
> +class BoolType : public Type {};
> +class Float32Type : public Type {};
> +class Int32Type : public Type {};
> +class Uint32Type : public Type {};

Should these really be C++ types? User-defined types will be represented in C++
as instances of a C++ class, not as new C++ classes. Why not just have an
instance of Type in-memory that is "the Float32 type"?

> Source/WebGPU/WGSL/AST/VariableQualifier.h:40
> +enum class AccessMode : uint8_t {

Other places you have the enum class inside the class its used in. How do you
decide whether it should be in the class or outside the class?

> Source/WebGPU/WGSL/Lexer.cpp:33
> +Token Lexer<T>::lex()

Aren't there lexer generators? Did you consider using one of them?

> Source/WebGPU/WGSL/Lexer.cpp:60
> +	   case '(':

WebKit style is to put the cases at the same indentation as the switch.

> Source/WebGPU/WGSL/Lexer.cpp:107
> +	       std::optional<uint64_t> postPeriod = parseDecimalInteger();

How does this work with "a.b"?

> Source/WebGPU/WGSL/Lexer.cpp:111
> +	       literalValue /= pow(10, m_offset - offset);

How confident are you that we're getting the right ULPs here?

> Source/WebGPU/WGSL/Lexer.cpp:123
> +	       break;

I don't understand how this works. surely "a - b" shouldn't return an invalid
token?

How does unary minus work?

> Source/WebGPU/WGSL/Lexer.cpp:155
> +			   fractionalPart /= pow(10, m_offset - offset);

Ditto about ULPs

> Source/WebGPU/WGSL/Lexer.cpp:222
> +		   // FIXME: a trie would be more efficient here, look at
JavaScriptCore/KeywordLookupGenerator.py for an example of code autogeneration
that produces such a trie.

Yep.

Do we actually need a custom python script, or can we use some existing
package? Maybe something in lex/yacc/bison/something else?

> Source/WebGPU/WGSL/Lexer.cpp:224
> +		   // FIXME: I don't think that true/false/f32/u32/i32/bool
need to be their own tokens, they could just be regular identifiers.

Is it legal to say "let f32 = 7;"? If not, they should probably be their own
tokens.

> Source/WebGPU/WGSL/Lexer.cpp:270
> +	   if (m_current == '\n') {

we have a name "newlineCharacter" for this in wtf/CharacterNames.h

> Source/WebGPU/WGSL/Lexer.cpp:309
> +    if (value.hasOverflowed())
> +	   return std::nullopt;

Surely this should fail in a different way than the !isDecimal() check above?

> Source/WebGPU/WGSL/Lexer.cpp:450
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isWhiteSpace(UChar ch)
> +{
> +    // WGSL does not currently support any non-ASCII whitespace characters
> +    return Lexer<LChar>::isWhiteSpace(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isIdentifierStart(UChar ch)
> +{
> +    // WGSL does not currently support any non-ASCII characters in
identifiers
> +    return Lexer<LChar>::isIdentifierStart(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isValidIdentifierCharacter(UChar ch)
> +{
> +    // WGSL does not currently support any non-ASCII characters in
identifiers
> +    return Lexer<LChar>::isValidIdentifierCharacter(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isDecimal(UChar ch)
> +{
> +    return Lexer<LChar>::isDecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isHexadecimal(UChar ch)
> +{
> +    return Lexer<LChar>::isHexadecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE uint64_t Lexer<UChar>::readDecimal(UChar ch)
> +{
> +    return Lexer<LChar>::readDecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE uint64_t Lexer<UChar>::readHexadecimal(UChar ch)
> +{
> +    return Lexer<LChar>::readHexadecimal(static_cast<LChar>(ch));
> +}

This I think is more evidence that a StringView and a size_t will be
sufficient.

> Source/WebGPU/WGSL/Lexer.h:39
> +	   if constexpr (std::is_same<T, LChar>::value) {

https://mobile.twitter.com/Litherum/status/1456047144519954436

I'm still not convinced that we actually need this template goop though. A
StringView and a size_t should be sufficient.

> Source/WebGPU/WGSL/Lexer.h:71
> +    // Parse pattern (e|E)(\+|-)?[0-9]+f? if it is present, and return the
exponent

Perhaps these comments would be more useful around the definition, rather than
the declaration.

> Source/WebGPU/WGSL/Lexer.h:75
> +    const T* m_code;

Why did you choose to hold this in a pointer, rather than a StringView and a
size_t? If you did it that way, this whole thing wouldn't need to be templated.

> Source/WebGPU/WGSL/Parser.cpp:35
> +    (void) lexer;

UNUSED_VARIABLE(lexer);

> Source/WebGPU/WGSL/Parser.cpp:37
> +    return { { }, { } };

This is my favorite shader. ��

> Source/WebGPU/WGSL/Parser.h:32
> +template<typename Lexer>

Are you planning on having more than one kind of lexer? Why not just hardcode
the one lexer that we've got?

> Source/WebGPU/WGSL/SourceSpan.h:40
> +    unsigned m_line;
> +    unsigned m_lineOffset;
> +    unsigned m_offset;
> +    unsigned m_length;

Can we derive 2 of these 4 fields at reporting time, so we can store half the
data in all the nodes?

> Source/WebGPU/WGSL/SourceSpan.h:42
> +    SourceSpan(unsigned line, unsigned lineOffset, unsigned offset, unsigned
length)

This is unnecessary. In the places where you could have written SourceSpan(1,
2, 3, 4) you can instead write SourceSpan { 1, 2, 3, 4 } (or sometimes even
just { 1, 2, 3, 4 }). Using { } is preferred according to webkit style.

> Source/WebGPU/WGSL/Token.h:51
> +    // FIXME: add all the other keywords:
file:///Users/rmorisset/MyGPUWeb/wgsl/wgsl.html#keyword-summary

putting paths inside your local machine probably isn't the best thing

> Source/WebGPU/WGSL/Token.h:76
> +    union {

Can we use Variant instead?

> Source/WebGPU/WGSL/Token.h:102
> +	   ASSERT(type == TokenType::IntegerLiteral
> +	       || type == TokenType::IntegerLiteralSigned
> +	       || type == TokenType::IntegerLiteralUnsigned
> +	       || type == TokenType::DecimalFloatLiteral
> +	       || type == TokenType::HexFloatLiteral);

I think there's a way to use the C++ type system to turn these into compile
errors rather than runtime errors. Same for the other constructors.

> Source/WebGPU/WGSL/Token.h:136
> +	   if (m_type == TokenType::Identifier)
> +	       (&m_ident)->~StringView();

Using a variant would get rid of this

> Source/WebGPU/WGSLUnitTests/WGSLLexerTests.mm:109
> +    ++lineNumber;
> +    checkNextTokenIsIdentifier("a");
> +    checkNextToken(WGSL::TokenType::Colon);
> +    checkNextTokenIsIdentifier("i32");
> +    checkNextToken(WGSL::TokenType::Semicolon);

Cool.


More information about the webkit-reviews mailing list