[webkit-changes] [WebKit/WebKit] 12c34e: [Yarr] Improve processing of an alternation of str...
Michael Saboff
noreply at github.com
Fri Feb 21 07:31:17 PST 2025
Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 12c34ef5e3058990f7e9997bc07ea1883c2a9eab
https://github.com/WebKit/WebKit/commit/12c34ef5e3058990f7e9997bc07ea1883c2a9eab
Author: Michael Saboff <msaboff at apple.com>
Date: 2025-02-21 (Fri, 21 Feb 2025)
Changed paths:
A JSTests/microbenchmarks/regexp-keyword-parsing.js
A JSTests/stress/regexp-parsing-tokens.js
M Source/JavaScriptCore/yarr/YarrJIT.cpp
M Source/JavaScriptCore/yarr/YarrPattern.cpp
M Source/JavaScriptCore/yarr/YarrPattern.h
Log Message:
-----------
[Yarr] Improve processing of an alternation of strings
https://bugs.webkit.org/show_bug.cgi?id=288102
rdar://145222010
Reviewed by Yusuke Suzuki.
Added the notion of a string list to a parsed RegExp that is in the form of
/^(?:break|case|which|do|for)/ with an optional trailing $.
Such a RegExp will not backtrack and therefore we can streamline the code we emit for such a pattern.
This change involves recognizing beginning of string anchored alternations of strings while parsing and
then treating the generation of JIT code differently for these patterns. This includes changing how
conditional branching works, specifically that instead of the "fall through on match" for each term,
to a "jump on match" for the whole alternation.
The current code generated for the "case" elternative is:
8:Term PatternCharacter checked-offset:(3) 'c'
<156> 0x11381430c: add w1, w1, #2
<160> 0x113814310: cmp w1, w2
<164> 0x113814314: b.hi 0x113814444 -> <468>
10:Term PatternCharacter checked-offset:(4) 'c'
<168> 0x113814318: sub x17, x0, #4
<172> 0x11381431c: ldr w17, [x17, x1]
<176> 0x113814320: movz w16, #0x6163
<180> 0x113814324: movk w16, #0x6573, lsl #16 -> 0x65736163
<184> 0x113814328: cmp w17, w16
<188> 0x11381432c: b.ne 0x113814444 -> <468>
11:Term PatternCharacter checked-offset:(4) 'a' already handled
12:Term PatternCharacter checked-offset:(4) 's' already handled
13:Term PatternCharacter checked-offset:(4) 'e' already handled
14:NestedAlternativeNext minimum-size:(5),checked-offset:(5)
<192> 0x113814330: movz x16, #0x4444
<196> 0x113814334: movk x16, #0x1381, lsl #16
<200> 0x113814338: movk x16, #0x8001, lsl #32
<204> 0x11381433c: movk x16, #0xc973, lsl #48 -> 0x113814444 JIT PC
<208> 0x113814340: stur x16, [sp, #8]
<212> 0x113814344: b 0x113814404 -> <404>
With some additional backtracking code:
9:NestedAlternativeNext minimum-size:(4),checked-offset:(4)
<468> 0x113814444: sub w1, w1, #2
<472> 0x113814448: b 0x113814348 -> <216>
With this change, the processing of "case" becomes:
9:StringListAlternativeNext minimum-size:(4),checked-offset:(4)
<132> 0x12a8285c4: sub w1, w1, #1
<136> 0x12a8285c8: cmp w1, w2
<140> 0x12a8285cc: b.hi 0x12a8285e8 -> <168>
10:Term PatternCharacter checked-offset:(4) 'c'
<144> 0x12a8285d0: sub x17, x0, #4
<148> 0x12a8285d4: ldr w17, [x17, x1]
<152> 0x12a8285d8: movz w16, #0x6163
<156> 0x12a8285dc: movk w16, #0x6573, lsl #16 -> 0x65736163
<160> 0x12a8285e0: cmp w17, w16
<164> 0x12a8285e4: b.eq 0x12a82866c -> <300>
11:Term PatternCharacter checked-offset:(4) 'a' already handled
12:Term PatternCharacter checked-offset:(4) 's' already handled
13:Term PatternCharacter checked-offset:(4) 'e' already handled
14:StringListAlternativeNext minimum-size:(5),checked-offset:(5)
With no backtracking code.
We are able to eliminate one branch and the saving of the continuation PC for backtracking.
The code size to process these string list RegExp is reduces. For the example RegExp above,
the prior version created 1940 bytes (485 instructions) of code while the code created with this
1392 bytes (345 instructions) of code, a nearly 30% reduction in code.
This change is a ~18% progression on the new regexp-keyword-parsing microbenchmark:
Baseline YarrStringList
regexp-keyword-parsing 136.7065+-0.9807 ^ 116.0161+-1.1791 ^ definitely 1.1783x faster
<geometric> 136.7065+-0.9807 ^ 116.0161+-1.1791 ^ definitely 1.1783x faster
* JSTests/microbenchmarks/regexp-keyword-parsing.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
(let.re.break.case.catch.continue.debugger.default.else.finally.if):
(let.re1.break.case.catch.continue.debugger.default.else.finally.if):
* JSTests/stress/regexp-parsing-tokens.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd):
(JSC::Yarr::YarrPatternConstructor::checkForTerminalParentheses):
(JSC::Yarr::PatternAlternative::dump):
(JSC::Yarr::PatternTerm::dump):
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):
(JSC::Yarr::PatternAlternative::PatternAlternative):
Canonical link: https://commits.webkit.org/290791@main
To unsubscribe from these emails, change your notification settings at https://github.com/WebKit/WebKit/settings/notifications
More information about the webkit-changes
mailing list