[webkit-reviews] review granted: [Bug 233276] [WGSL] Implement enough of the lexer for the simplest shaders : [Attachment 445349] Patch
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Nov 29 23:06:08 PST 2021
Myles C. Maxfield <mmaxfield at apple.com> has granted Robin Morisset
<rmorisset at apple.com>'s request for review:
Bug 233276: [WGSL] Implement enough of the lexer for the simplest shaders
https://bugs.webkit.org/show_bug.cgi?id=233276
Attachment 445349: Patch
https://bugs.webkit.org/attachment.cgi?id=445349&action=review
--- Comment #5 from Myles C. Maxfield <mmaxfield at apple.com> ---
Comment on attachment 445349
--> https://bugs.webkit.org/attachment.cgi?id=445349
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=445349&action=review
> Source/WebGPU/WGSL/AST/ASTNode.h:30
> +namespace WGSL { namespace AST {
namespace WGSL::AST {
> Source/WebGPU/WGSL/AST/ASTNode.h:39
> +// FIXME: all of these should probably be marked FAST_MALLOC or whatever it
is to use bmalloc/libpas
I did something similar in https://trac.webkit.org/changeset/286224/webkit
> Source/WebGPU/WGSL/AST/Attribute.h:32
> +class Attribute : public ASTNode {};
Nit: Can classes have the } on a new line?
Why are attributes represented as AST nodes? They're not recursive. Why not
treat them as data?
> Source/WebGPU/WGSL/AST/Attribute.h:36
> +class AttributeGroup : public Attribute {
Nit: Usually descriptions go before the thing they're describing, so this
should probably be GroupAttribute (and ditto for below).
> Source/WebGPU/WGSL/AST/Attribute.h:38
> + unsigned group() const { return m_value; }
Nit: usually we have an empty line before private: (or protected:)
> Source/WebGPU/WGSL/AST/Attribute.h:52
> + enum Stage : uint8_t {
enum class?
> Source/WebGPU/WGSL/AST/Expression.h:32
> +class Expression : public ASTNode {};
This file is going to get quite big. Are you sure you want to have all nodes
defined in a single file?
> Source/WebGPU/WGSL/AST/GlobalDecl.h:50
> + // Each of the following may be null
Nit: I don't think this comment is particularly helpful, because we have a
class UniqueRef for non-nullable unique ptrs, so the type says that it can be
null.
The comment on the next line, though, is helpful.
> Source/WebGPU/WGSL/AST/GlobalDecl.h:63
> + StringView m_name;
It's cool that you're using StringViews and not Strings.
> Source/WebGPU/WGSL/AST/GlobalDecl.h:66
> +class StructDecl : public GlobalDecl {
Nit: Blank line between classes, please.
> Source/WebGPU/WGSL/AST/GlobalDecl.h:85
> +};
Ditto.
> Source/WebGPU/WGSL/AST/GlobalDecl.h:92
> + Type* maybeReturnType() { return m_returnType.get(); }
Because this returns a pointer, I don't see much value in calling it "maybe".
If it wasn't "maybe," it would return a reference.
> Source/WebGPU/WGSL/AST/Statement.h:34
> +class CompoundStatement : public Statement {
Why not "Block"?
> Source/WebGPU/WGSL/AST/Statement.h:41
> +class ReturnStatement : public Statement {
You're probably going to want the SPECIALIZE_TYPE_TRAITS stuff and virtual
functions to be able to downcast these.
> Source/WebGPU/WGSL/AST/Statement.h:43
> + Expression* maybeExpression() { return m_expression.get(); }
Ditto about "maybe"
> Source/WebGPU/WGSL/AST/Statement.h:50
> + Expression* maybeLhs() { return m_lhs.get(); }
... and other places too
> Source/WebGPU/WGSL/AST/Statement.h:51
> + Expression* rhs() { return m_rhs.get(); }
This should probably return a &
> Source/WebGPU/WGSL/AST/Statement.h:55
> + std::unique_ptr<Expression> m_rhs;
This should probably be a UniqueRef
> Source/WebGPU/WGSL/AST/Type.h:44
> + enum Kind {
Woah. I guess that's cool, since we don't have real templates/generics (yet).
> Source/WebGPU/WGSL/AST/Type.h:68
> +class BoolType : public Type {};
> +class Float32Type : public Type {};
> +class Int32Type : public Type {};
> +class Uint32Type : public Type {};
Should these really be C++ types? User-defined types will be represented in C++
as instances of a C++ class, not as new C++ classes. Why not just have an
instance of Type in-memory that is "the Float32 type"?
> Source/WebGPU/WGSL/AST/VariableQualifier.h:40
> +enum class AccessMode : uint8_t {
Other places you have the enum class inside the class its used in. How do you
decide whether it should be in the class or outside the class?
> Source/WebGPU/WGSL/Lexer.cpp:33
> +Token Lexer<T>::lex()
Aren't there lexer generators? Did you consider using one of them?
> Source/WebGPU/WGSL/Lexer.cpp:60
> + case '(':
WebKit style is to put the cases at the same indentation as the switch.
> Source/WebGPU/WGSL/Lexer.cpp:107
> + std::optional<uint64_t> postPeriod = parseDecimalInteger();
How does this work with "a.b"?
> Source/WebGPU/WGSL/Lexer.cpp:111
> + literalValue /= pow(10, m_offset - offset);
How confident are you that we're getting the right ULPs here?
> Source/WebGPU/WGSL/Lexer.cpp:123
> + break;
I don't understand how this works. surely "a - b" shouldn't return an invalid
token?
How does unary minus work?
> Source/WebGPU/WGSL/Lexer.cpp:155
> + fractionalPart /= pow(10, m_offset - offset);
Ditto about ULPs
> Source/WebGPU/WGSL/Lexer.cpp:222
> + // FIXME: a trie would be more efficient here, look at
JavaScriptCore/KeywordLookupGenerator.py for an example of code autogeneration
that produces such a trie.
Yep.
Do we actually need a custom python script, or can we use some existing
package? Maybe something in lex/yacc/bison/something else?
> Source/WebGPU/WGSL/Lexer.cpp:224
> + // FIXME: I don't think that true/false/f32/u32/i32/bool
need to be their own tokens, they could just be regular identifiers.
Is it legal to say "let f32 = 7;"? If not, they should probably be their own
tokens.
> Source/WebGPU/WGSL/Lexer.cpp:270
> + if (m_current == '\n') {
we have a name "newlineCharacter" for this in wtf/CharacterNames.h
> Source/WebGPU/WGSL/Lexer.cpp:309
> + if (value.hasOverflowed())
> + return std::nullopt;
Surely this should fail in a different way than the !isDecimal() check above?
> Source/WebGPU/WGSL/Lexer.cpp:450
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isWhiteSpace(UChar ch)
> +{
> + // WGSL does not currently support any non-ASCII whitespace characters
> + return Lexer<LChar>::isWhiteSpace(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isIdentifierStart(UChar ch)
> +{
> + // WGSL does not currently support any non-ASCII characters in
identifiers
> + return Lexer<LChar>::isIdentifierStart(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isValidIdentifierCharacter(UChar ch)
> +{
> + // WGSL does not currently support any non-ASCII characters in
identifiers
> + return Lexer<LChar>::isValidIdentifierCharacter(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isDecimal(UChar ch)
> +{
> + return Lexer<LChar>::isDecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE bool Lexer<UChar>::isHexadecimal(UChar ch)
> +{
> + return Lexer<LChar>::isHexadecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE uint64_t Lexer<UChar>::readDecimal(UChar ch)
> +{
> + return Lexer<LChar>::readDecimal(static_cast<LChar>(ch));
> +}
> +
> +template <>
> +ALWAYS_INLINE uint64_t Lexer<UChar>::readHexadecimal(UChar ch)
> +{
> + return Lexer<LChar>::readHexadecimal(static_cast<LChar>(ch));
> +}
This I think is more evidence that a StringView and a size_t will be
sufficient.
> Source/WebGPU/WGSL/Lexer.h:39
> + if constexpr (std::is_same<T, LChar>::value) {
https://mobile.twitter.com/Litherum/status/1456047144519954436
I'm still not convinced that we actually need this template goop though. A
StringView and a size_t should be sufficient.
> Source/WebGPU/WGSL/Lexer.h:71
> + // Parse pattern (e|E)(\+|-)?[0-9]+f? if it is present, and return the
exponent
Perhaps these comments would be more useful around the definition, rather than
the declaration.
> Source/WebGPU/WGSL/Lexer.h:75
> + const T* m_code;
Why did you choose to hold this in a pointer, rather than a StringView and a
size_t? If you did it that way, this whole thing wouldn't need to be templated.
> Source/WebGPU/WGSL/Parser.cpp:35
> + (void) lexer;
UNUSED_VARIABLE(lexer);
> Source/WebGPU/WGSL/Parser.cpp:37
> + return { { }, { } };
This is my favorite shader.
> Source/WebGPU/WGSL/Parser.h:32
> +template<typename Lexer>
Are you planning on having more than one kind of lexer? Why not just hardcode
the one lexer that we've got?
> Source/WebGPU/WGSL/SourceSpan.h:40
> + unsigned m_line;
> + unsigned m_lineOffset;
> + unsigned m_offset;
> + unsigned m_length;
Can we derive 2 of these 4 fields at reporting time, so we can store half the
data in all the nodes?
> Source/WebGPU/WGSL/SourceSpan.h:42
> + SourceSpan(unsigned line, unsigned lineOffset, unsigned offset, unsigned
length)
This is unnecessary. In the places where you could have written SourceSpan(1,
2, 3, 4) you can instead write SourceSpan { 1, 2, 3, 4 } (or sometimes even
just { 1, 2, 3, 4 }). Using { } is preferred according to webkit style.
> Source/WebGPU/WGSL/Token.h:51
> + // FIXME: add all the other keywords:
file:///Users/rmorisset/MyGPUWeb/wgsl/wgsl.html#keyword-summary
putting paths inside your local machine probably isn't the best thing
> Source/WebGPU/WGSL/Token.h:76
> + union {
Can we use Variant instead?
> Source/WebGPU/WGSL/Token.h:102
> + ASSERT(type == TokenType::IntegerLiteral
> + || type == TokenType::IntegerLiteralSigned
> + || type == TokenType::IntegerLiteralUnsigned
> + || type == TokenType::DecimalFloatLiteral
> + || type == TokenType::HexFloatLiteral);
I think there's a way to use the C++ type system to turn these into compile
errors rather than runtime errors. Same for the other constructors.
> Source/WebGPU/WGSL/Token.h:136
> + if (m_type == TokenType::Identifier)
> + (&m_ident)->~StringView();
Using a variant would get rid of this
> Source/WebGPU/WGSLUnitTests/WGSLLexerTests.mm:109
> + ++lineNumber;
> + checkNextTokenIsIdentifier("a");
> + checkNextToken(WGSL::TokenType::Colon);
> + checkNextTokenIsIdentifier("i32");
> + checkNextToken(WGSL::TokenType::Semicolon);
Cool.
More information about the webkit-reviews
mailing list