[webkit-changes] [WebKit/WebKit] 270c82: Implement RegExp `v` flag with set notation + prop...

Michael Saboff noreply at github.com
Fri Mar 3 17:05:06 PST 2023


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 270c824459cec4d19dab347a8db1526e0be50737
      https://github.com/WebKit/WebKit/commit/270c824459cec4d19dab347a8db1526e0be50737
  Author: Michael Saboff <msaboff at apple.com>
  Date:   2023-03-03 (Fri, 03 Mar 2023)

  Changed paths:
    M JSTests/es6/Proxy_internal_get_calls_RegExp.prototype.flags.js
    A JSTests/stress/regexp-vflag-property-of-strings.js
    M JSTests/stress/static-getter-in-names.js
    M JSTests/test262/config.yaml
    M LayoutTests/js/Object-getOwnPropertyNames-expected.txt
    M LayoutTests/js/script-tests/Object-getOwnPropertyNames.js
    M Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
    M Source/JavaScriptCore/builtins/BuiltinNames.h
    M Source/JavaScriptCore/builtins/RegExpPrototype.js
    M Source/JavaScriptCore/builtins/StringPrototype.js
    M Source/JavaScriptCore/bytecode/LinkTimeConstant.h
    M Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h
    M Source/JavaScriptCore/dfg/DFGFixupPhase.cpp
    M Source/JavaScriptCore/dfg/DFGOperations.cpp
    M Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp
    M Source/JavaScriptCore/runtime/CachedTypes.cpp
    M Source/JavaScriptCore/runtime/CommonIdentifiers.h
    M Source/JavaScriptCore/runtime/JSGlobalObject.cpp
    M Source/JavaScriptCore/runtime/JSGlobalObject.h
    M Source/JavaScriptCore/runtime/JSGlobalObjectInlines.h
    M Source/JavaScriptCore/runtime/RegExp.h
    M Source/JavaScriptCore/runtime/RegExpCache.h
    M Source/JavaScriptCore/runtime/RegExpObject.cpp
    M Source/JavaScriptCore/runtime/RegExpPrototype.cpp
    A Source/JavaScriptCore/ucd/emoji-sequences.txt
    A Source/JavaScriptCore/ucd/emoji-zwj-sequences.txt
    M Source/JavaScriptCore/yarr/Yarr.h
    M Source/JavaScriptCore/yarr/YarrErrorCode.cpp
    M Source/JavaScriptCore/yarr/YarrErrorCode.h
    M Source/JavaScriptCore/yarr/YarrFlags.cpp
    M Source/JavaScriptCore/yarr/YarrFlags.h
    M Source/JavaScriptCore/yarr/YarrInterpreter.cpp
    M Source/JavaScriptCore/yarr/YarrInterpreter.h
    M Source/JavaScriptCore/yarr/YarrJIT.cpp
    M Source/JavaScriptCore/yarr/YarrParser.h
    M Source/JavaScriptCore/yarr/YarrPattern.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.h
    M Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp
    M Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp
    M Source/JavaScriptCore/yarr/YarrUnicodeProperties.h
    M Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py
    M Source/WebCore/contentextensions/URLFilterParser.cpp

  Log Message:
  -----------
  Implement RegExp `v` flag with set notation + properties of strings
https://bugs.webkit.org/show_bug.cgi?id=241593
rdar://100337109

Reviewed by Yusuke Suzuki.

This change implements the TC39 stage 3 proposal RegExp v flag with set notation + properties of strings,
https://github.com/tc39/proposal-regexp-v-flag.  It adds a new "unicodeSets" compile mode for the Yarr engine.
Like the prior Unicode Yarr features, this change is driven by Unicode Database Files (UCD).
This change includes two such new files, JavaScriptCore/ucd/{emoji-sequences.txt & emoji-zwj-sequences.txt}.

The newly added properties include lists of strings.  These strings are processed via the character class syntax
through.  When it comes to matching however, there is some desuguraing that turns such a property of strings into
a list of alternations.  For example, say a property has strings str1...strN plus a traditional character class,
single-character-class, we create the pattern equivalent of:
     (?:str1|str2|...|strN|[single-character-class])
Per the spec, longer strings appear earlier in the alternation, and before the traditional character class.
This allows for searching for longer properties in a property list where substrings of other strings are included
in that list.

There are new set of combining operators allowed in the class set character classes.  Two character class elements
that appear adjacent to each other implicitly have the Union combining operations.  There is also an Intersection
operation with the && operator and a Subtraction operation with the || operator.

There is new ClassSet parsing that follows new "cleaner" rules that traditional character classes.
The prior ccharacter class constructor and delegates are mostly unchanged, except for the compile mode
now being switched on an enum instead of a bool.

Added check that both 'u' and 'v' flags don't appear in the same RegExp.

Added unicodeSets getter watchpoint to the m_regExpPrimordialPropertiesWatchpointSet.

* JSTests/es6/Proxy_internal_get_calls_RegExp.prototype.flags.js:
* JSTests/stress/regexp-vflag-property-of-strings.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* JSTests/stress/static-getter-in-names.js:
* JSTests/test262/config.yaml:
* LayoutTests/js/Object-getOwnPropertyNames-expected.txt:
* LayoutTests/js/script-tests/Object-getOwnPropertyNames.js:
* Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:
* Source/JavaScriptCore/builtins/BuiltinNames.h:
* Source/JavaScriptCore/builtins/RegExpPrototype.js:
(linkTimeConstant.hasObservableSideEffectsForRegExpMatch):
(linkTimeConstant.hasObservableSideEffectsForRegExpSplit):
(overriddenName.string_appeared_here.split):
* Source/JavaScriptCore/builtins/StringPrototype.js:
(linkTimeConstant.hasObservableSideEffectsForStringReplace):
* Source/JavaScriptCore/bytecode/LinkTimeConstant.h:
* Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):
* Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:
(JSC::DFG::FixupPhase::addStringReplacePrimordialChecks):
* Source/JavaScriptCore/dfg/DFGOperations.cpp:
(JSC::DFG::JSC_DEFINE_JIT_OPERATION):
* Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:
(JSC::DFG::StrengthReductionPhase::handleNode):
* Source/JavaScriptCore/runtime/CachedTypes.cpp:
* Source/JavaScriptCore/runtime/CommonIdentifiers.h:
* Source/JavaScriptCore/runtime/JSGlobalObject.cpp:
(JSC::JSGlobalObject::init):
* Source/JavaScriptCore/runtime/JSGlobalObject.h:
* Source/JavaScriptCore/runtime/JSGlobalObjectInlines.h:
(JSC::JSGlobalObject::regExpProtoUnicodeSetsGetter const):
* Source/JavaScriptCore/runtime/RegExp.h:
* Source/JavaScriptCore/runtime/RegExpCache.h:
* Source/JavaScriptCore/runtime/RegExpObject.cpp:
(JSC::RegExpObject::matchGlobal):
* Source/JavaScriptCore/runtime/RegExpPrototype.cpp:
(JSC::RegExpPrototype::finishCreation):
(JSC::JSC_DEFINE_HOST_FUNCTION):
* Source/JavaScriptCore/ucd/emoji-sequences.txt: Added.
* Source/JavaScriptCore/ucd/emoji-zwj-sequences.txt: Added.
* Source/JavaScriptCore/yarr/Yarr.h:
* Source/JavaScriptCore/yarr/YarrErrorCode.cpp:
(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):
* Source/JavaScriptCore/yarr/YarrErrorCode.h:
* Source/JavaScriptCore/yarr/YarrFlags.h:
* Source/JavaScriptCore/yarr/YarrInterpreter.cpp:
(JSC::Yarr::ByteTermDumper::ByteTermDumper):
(JSC::Yarr::ByteTermDumper::unicode const):
(JSC::Yarr::ByteTermDumper::unicodeSets const):
(JSC::Yarr::ByteTermDumper::eitherUnicode const):
(JSC::Yarr::Interpreter::tryConsumeBackReference):
(JSC::Yarr::Interpreter::matchCharacterClass):
(JSC::Yarr::Interpreter::backtrackCharacterClass):
(JSC::Yarr::Interpreter::matchDisjunction):
(JSC::Yarr::Interpreter::Interpreter):
(JSC::Yarr::Interpreter::isLegacyCompilation const):
(JSC::Yarr::Interpreter::isUnicodeCompilation const):
(JSC::Yarr::Interpreter::isUnicodeSetsCompilation const):
(JSC::Yarr::Interpreter::isEitherUnicodeCompilation const):
(JSC::Yarr::ByteTermDumper::dumpTerm):
(JSC::Yarr::ByteTermDumper::unicode): Deleted.
* Source/JavaScriptCore/yarr/YarrInterpreter.h:
(JSC::Yarr::BytecodePattern::BytecodePattern):
(JSC::Yarr::BytecodePattern::compileMode const):
(JSC::Yarr::BytecodePattern::unicodeSets const):
(JSC::Yarr::BytecodePattern::eitherUnicode const):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrParser.h:
(JSC::Yarr::Parser::CharacterClassParserDelegate::CharacterClassParserDelegate):
(JSC::Yarr::Parser::ClassSetParserDelegate::ClassSetParserDelegate):
(JSC::Yarr::Parser::ClassSetParserDelegate::begin):
(JSC::Yarr::Parser::ClassSetParserDelegate::nestedClassBegin):
(JSC::Yarr::Parser::ClassSetParserDelegate::doneAfterCharacterClassEnd):
(JSC::Yarr::Parser::ClassSetParserDelegate::setUnionOp):
(JSC::Yarr::Parser::ClassSetParserDelegate::setSubtractOp):
(JSC::Yarr::Parser::ClassSetParserDelegate::setIntersectionOp):
(JSC::Yarr::Parser::ClassSetParserDelegate::flushCachedCharacterIfNeeded):
(JSC::Yarr::Parser::ClassSetParserDelegate::atomPatternCharacter):
(JSC::Yarr::Parser::ClassSetParserDelegate::atomBuiltInCharacterClass):
(JSC::Yarr::Parser::ClassSetParserDelegate::end):
(JSC::Yarr::Parser::ClassSetParserDelegate::error):
(JSC::Yarr::Parser::ClassSetParserDelegate::assertionWordBoundary):
(JSC::Yarr::Parser::ClassSetParserDelegate::atomBackReference):
(JSC::Yarr::Parser::ClassSetParserDelegate::atomNamedBackReference):
(JSC::Yarr::Parser::ClassSetParserDelegate::atomNamedForwardReference):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::ClassStringDisjunctionParserDelegate):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomPatternCharacter):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::newAlternative):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::end):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::assertionWordBoundary):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomBackReference):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomNamedBackReference):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomNamedForwardReference):
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomBuiltInCharacterClass):
(JSC::Yarr::Parser::Parser):
(JSC::Yarr::Parser::isIdentityEscapeAnError):
(JSC::Yarr::Parser::parseEscape):
(JSC::Yarr::Parser::consumePossibleSurrogatePair):
(JSC::Yarr::Parser::parseAtomEscape):
(JSC::Yarr::Parser::parseCharacterClassEscape):
(JSC::Yarr::Parser::parseClassSetEscape):
(JSC::Yarr::Parser::parseClassStringDisjunctionEscape):
(JSC::Yarr::Parser::parseCharacterClass):
(JSC::Yarr::Parser::parseClassSet):
(JSC::Yarr::Parser::parseClassStringDisjunction):
(JSC::Yarr::Parser::parseParenthesesEnd):
(JSC::Yarr::Parser::parseTokens):
(JSC::Yarr::Parser::handleIllegalReferences):
(JSC::Yarr::Parser::tryConsumeUnicodeEscape):
(JSC::Yarr::Parser::tryConsumeUnicodePropertyExpression):
(JSC::Yarr::Parser::isLegacyCompilation const):
(JSC::Yarr::Parser::isUnicodeCompilation const):
(JSC::Yarr::Parser::isUnicodeSetsCompilation const):
(JSC::Yarr::Parser::isEitherUnicodeCompilation const):
(JSC::Yarr::compileMode):
(JSC::Yarr::parse):
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::CharacterClassConstructor):
(JSC::Yarr::CharacterClassConstructor::reset):
(JSC::Yarr::CharacterClassConstructor::combiningSetOp):
(JSC::Yarr::CharacterClassConstructor::append):
(JSC::Yarr::CharacterClassConstructor::appendInverted):
(JSC::Yarr::CharacterClassConstructor::putRange):
(JSC::Yarr::CharacterClassConstructor::atomClassStringDisjunction):
(JSC::Yarr::CharacterClassConstructor::performSetOpWith):
(JSC::Yarr::CharacterClassConstructor::performSetOpWithStrings):
(JSC::Yarr::CharacterClassConstructor::performSetOpWithMatches):
(JSC::Yarr::CharacterClassConstructor::hasInverteStrings):
(JSC::Yarr::CharacterClassConstructor::compareUTF32Strings):
(JSC::Yarr::CharacterClassConstructor::sort):
(JSC::Yarr::CharacterClassConstructor::charClass):
(JSC::Yarr::CharacterClassConstructor::mergeRangesFrom):
(JSC::Yarr::CharacterClassConstructor::unionStrings):
(JSC::Yarr::CharacterClassConstructor::intersectionStrings):
(JSC::Yarr::CharacterClassConstructor::subtractionStrings):
(JSC::Yarr::CharacterClassConstructor::asciiOpSorted):
(JSC::Yarr::CharacterClassConstructor::unicodeOpSorted):
(JSC::Yarr::YarrPatternConstructor::YarrPatternConstructor):
(JSC::Yarr::YarrPatternConstructor::resetForReparsing):
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
(JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassAtom):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassRange):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassBuiltIn):
(JSC::Yarr::YarrPatternConstructor::atomClassStringDisjunction):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassSetOp):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassPushNested):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassPopNested):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassEnd):
(JSC::Yarr::YarrPattern::compile):
(JSC::Yarr::PatternTerm::dump):
(JSC::Yarr::YarrPattern::dumpPatternString):
(JSC::Yarr::YarrPattern::dumpPattern):
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::CharacterClass::CharacterClass):
(JSC::Yarr::CharacterClass::hasNonBMPCharacters const):
(JSC::Yarr::CharacterClass::hasOneCharacterSize const):
(JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters const):
(JSC::Yarr::CharacterClass::hasStrings const):
(JSC::Yarr::CharacterClass::hasSingleCharacters const):
(JSC::Yarr::ClassSet::ClassSet):
(JSC::Yarr::YarrPattern::unicodeSets const):
(JSC::Yarr::YarrPattern::eitherUnicode const):
(JSC::Yarr::YarrPattern::compileMode const):
(JSC::Yarr::CharacterClass::hasNonBMPCharacters): Deleted.
(JSC::Yarr::CharacterClass::hasOneCharacterSize): Deleted.
(JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters): Deleted.
* Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:
(JSC::Yarr::SyntaxChecker::atomClassStringDisjunction):
(JSC::Yarr::SyntaxChecker::atomCharacterClassSetOp):
(JSC::Yarr::SyntaxChecker::atomCharacterClassPushNested):
(JSC::Yarr::SyntaxChecker::atomCharacterClassPopNested):
(JSC::Yarr::checkSyntax):
* Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:
(JSC::Yarr::unicodeMatchProperty):
(JSC::Yarr::createUnicodeCharacterClassFor):
(JSC::Yarr::characterClassMayContainStrings):
* Source/JavaScriptCore/yarr/YarrUnicodeProperties.h:
* Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py:
(PropertyData.__init__):
(PropertyData.addMatchString):
(PropertyData.stringsCompare):
(PropertyData):
(PropertyData.sortStrings):
(PropertyData.dumpMatchData):
(PropertyData.convertStringToCppFormat):
(PropertyData.dump):
(PropertyData.dumpAll):
(PropertyData.dumpMayContainStringFunc):
(BinaryProperty.dump):
(SequenceProperty):
(SequenceProperty.__init__):
(SequenceProperty.parsePropertyFile):
(SequenceProperty.dump):
* Source/WebCore/contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::PatternParser::atomClassStringDisjunction):
(WebCore::ContentExtensions::PatternParser::atomCharacterClassSetOp):
(WebCore::ContentExtensions::PatternParser::atomCharacterClassPushNested):
(WebCore::ContentExtensions::PatternParser::atomCharacterClassPopNested):
(WebCore::ContentExtensions::URLFilterParser::addPattern):

Canonical link: https://commits.webkit.org/261188@main




More information about the webkit-changes mailing list