[webkit-changes] [WebKit/WebKit] 67969c: [JSC] RegExp /u flag doesn't respect atomicity of ...

Michael Saboff noreply at github.com
Wed Mar 13 09:47:23 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 67969c218ddf357855d3c26ca4769b194fb1f4db
      https://github.com/WebKit/WebKit/commit/67969c218ddf357855d3c26ca4769b194fb1f4db
  Author: Michael Saboff <msaboff at apple.com>
  Date:   2024-03-13 (Wed, 13 Mar 2024)

  Changed paths:
    A JSTests/stress/regexp-unicode-dangling-surrogates.js
    M JSTests/test262/expectations.yaml
    M Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
    M Source/JavaScriptCore/yarr/YarrInterpreter.cpp
    M Source/JavaScriptCore/yarr/YarrJIT.cpp
    M Source/JavaScriptCore/yarr/YarrJITRegisters.h

  Log Message:
  -----------
  [JSC] RegExp /u flag doesn't respect atomicity of surrogate pairs
https://bugs.webkit.org/show_bug.cgi?id=267011
rdar://124217243

Reviewed by Yusuke Suzuki.

Fixed bug where a dangling surrogate in a pattern matches half a valid surrogate pair in a subject string.
Updated the reading of surrogates that when we read starting in the middle of a valid surrogate pair, we return an error
code point which we never match.  Updated backtracking for non-greedy character class matching to use the start index
as the appropriate index to reset when we fail to match, instead of doing math with the current match count.

The fix above originally landed Jan 13, but it  regressed some Unicode performance tests and was subsequently rolled out.

This change builds on the prior fix by adding three optimizations to mitigate the performance loss in the earlier fix..
 1. We don't need to check for the errorCodePoint (-1) if we read a dangling surrogate when we are matching a normal,
    non-inverted atoms.  The errorCodePoint won't match in that case.  For inverted atoms, we still need to check for the
    errorCodePoint and fail matching that atom.

 2. Changed the code emitted for a character class that has only one range.  Before this change, we'd emit all range
    checks with each range check's failure target address the instruction right after the two conditional branches.
    This works fine if there is another range check.  When all range checks have been performed, we add a branch to the
    failure (backtracking) code.

    If the character class has only one range and doesn't have any list of single characters, we can eliminate the branch
    to failure code by changing the two conditional branches that make up a range check go directly to the failure code.

    This change appears to help JetStream2/babylon-wtp by at least 1.5+%.

 3. (ARM64 only) When we read a non-BMB code point, consisting of two surrogate code units, and we fail to match any atom
    in the body of a RegExp, we were incrementing the subject string index by 1 and going back to the top of the loop to
    start matching the pattern again.  Now we dedicate a register to hold either 0 or 1 depending on the width of the first
    character read for that loop iteration.  When advancing the index for the next iteration, we add the value of that register
    to the updated index.  This eliminates one iteration through the matching loop for each non-BMP code point that doesn't
    match.

    This change appears to help JetStream2/UniPoker by 3+%.

Added a new test and updated the Test262 exceptions file.

* JSTests/stress/regexp-unicode-dangling-surrogates.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* JSTests/test262/expectations.yaml:
* Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:
(JSC::MacroAssemblerARM64::moveConditionallyTest32): Added to conditionally zero a register.
(JSC::MacroAssemblerARM64::addOneConditionally32): Added to conditionally increment a register.
* Source/JavaScriptCore/yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::InputStream::readChecked):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrJITRegisters.h:

Canonical link: https://commits.webkit.org/276031@main



To unsubscribe from these emails, change your notification settings at https://github.com/WebKit/WebKit/settings/notifications


More information about the webkit-changes mailing list