[Webkit-unassigned] [Bug 258706] JS markdown parser performs 50x slower in JSC compared to V8, likely due to regex

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed Jul 26 14:24:46 PDT 2023


https://bugs.webkit.org/show_bug.cgi?id=258706

--- Comment #2 from Michael Saboff <msaboff at apple.com> ---
We are running several RegExp’s through the YARR interpreter.  I instrumented the RegExp engine to determine which ones and the reasons why.  There are 5 RegExp that contain variable counted parenthesis with a non-zero minimum.  All 5 of these RegExp contain one or more disjunctions like (?:\*[ \t]*){3,}.  There is one RegExp that contains a back reference that currently cannot be JIT’ed because it is a RegExp compiled for 16 bit character matching and it has the ignore case flag.

The first non-zero based variable counted parenthesis issue could be addressed at least two ways:
1) With some RegExp rewriting, e.g. changing (?:\*[ \t]*){3,} to (?:\*[ \t]*){3}(?:\*[ \t]*)*.  We currently do this for one or more variable counted parens, ie (?:\*[ \t]*)+
2) Some more involved work to properly handle the fixed non-zero count of variable counted parens in the JIT directly.

The reason we don’t support back references for ignore case 16bit JIT’ing is due to the complicated case folding rules for some Unicode characters.  Again there are two possible options for addressing this bug.
1) If the RegExp contains back references, allow the back reference if the referenced group’s contents are easily case folded.  8 bit characters would be easily handled by this fix.
2) Completely handle Unicode case folding.  This could be built upon the work of the first alternative.  A full implementation of this approach would require calling out to a case folding helper for some patterns.  This helper could be generated as needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20230726/7b20e1af/attachment.htm>


More information about the webkit-unassigned mailing list