[webkit-changes] [WebKit/WebKit] 67969c: [JSC] RegExp /u flag doesn't respect atomicity of ...
Michael Saboff
noreply at github.com
Wed Mar 13 09:47:23 PDT 2024
Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 67969c218ddf357855d3c26ca4769b194fb1f4db
https://github.com/WebKit/WebKit/commit/67969c218ddf357855d3c26ca4769b194fb1f4db
Author: Michael Saboff <msaboff at apple.com>
Date: 2024-03-13 (Wed, 13 Mar 2024)
Changed paths:
A JSTests/stress/regexp-unicode-dangling-surrogates.js
M JSTests/test262/expectations.yaml
M Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
M Source/JavaScriptCore/yarr/YarrInterpreter.cpp
M Source/JavaScriptCore/yarr/YarrJIT.cpp
M Source/JavaScriptCore/yarr/YarrJITRegisters.h
Log Message:
-----------
[JSC] RegExp /u flag doesn't respect atomicity of surrogate pairs
https://bugs.webkit.org/show_bug.cgi?id=267011
rdar://124217243
Reviewed by Yusuke Suzuki.
Fixed bug where a dangling surrogate in a pattern matches half a valid surrogate pair in a subject string.
Updated the reading of surrogates that when we read starting in the middle of a valid surrogate pair, we return an error
code point which we never match. Updated backtracking for non-greedy character class matching to use the start index
as the appropriate index to reset when we fail to match, instead of doing math with the current match count.
The fix above originally landed Jan 13, but it regressed some Unicode performance tests and was subsequently rolled out.
This change builds on the prior fix by adding three optimizations to mitigate the performance loss in the earlier fix..
1. We don't need to check for the errorCodePoint (-1) if we read a dangling surrogate when we are matching a normal,
non-inverted atoms. The errorCodePoint won't match in that case. For inverted atoms, we still need to check for the
errorCodePoint and fail matching that atom.
2. Changed the code emitted for a character class that has only one range. Before this change, we'd emit all range
checks with each range check's failure target address the instruction right after the two conditional branches.
This works fine if there is another range check. When all range checks have been performed, we add a branch to the
failure (backtracking) code.
If the character class has only one range and doesn't have any list of single characters, we can eliminate the branch
to failure code by changing the two conditional branches that make up a range check go directly to the failure code.
This change appears to help JetStream2/babylon-wtp by at least 1.5+%.
3. (ARM64 only) When we read a non-BMB code point, consisting of two surrogate code units, and we fail to match any atom
in the body of a RegExp, we were incrementing the subject string index by 1 and going back to the top of the loop to
start matching the pattern again. Now we dedicate a register to hold either 0 or 1 depending on the width of the first
character read for that loop iteration. When advancing the index for the next iteration, we add the value of that register
to the updated index. This eliminates one iteration through the matching loop for each non-BMP code point that doesn't
match.
This change appears to help JetStream2/UniPoker by 3+%.
Added a new test and updated the Test262 exceptions file.
* JSTests/stress/regexp-unicode-dangling-surrogates.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* JSTests/test262/expectations.yaml:
* Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:
(JSC::MacroAssemblerARM64::moveConditionallyTest32): Added to conditionally zero a register.
(JSC::MacroAssemblerARM64::addOneConditionally32): Added to conditionally increment a register.
* Source/JavaScriptCore/yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::InputStream::readChecked):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrJITRegisters.h:
Canonical link: https://commits.webkit.org/276031@main
To unsubscribe from these emails, change your notification settings at https://github.com/WebKit/WebKit/settings/notifications
More information about the webkit-changes
mailing list