[webkit-reviews] review granted: [Bug 228810] [JSC] Yarr's character tracking for BoyerMoore search should be more precise : [Attachment 434965] Patch
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Thu Aug 5 15:14:32 PDT 2021
Saam Barati <sbarati at apple.com> has granted Yusuke Suzuki <ysuzuki at apple.com>'s
request for review:
Bug 228810: [JSC] Yarr's character tracking for BoyerMoore search should be
more precise
https://bugs.webkit.org/show_bug.cgi?id=228810
Attachment 434965: Patch
https://bugs.webkit.org/attachment.cgi?id=434965&action=review
--- Comment #3 from Saam Barati <sbarati at apple.com> ---
Comment on attachment 434965
--> https://bugs.webkit.org/attachment.cgi?id=434965
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=434965&action=review
r=me
> Source/JavaScriptCore/ChangeLog:8
> + We should track candidate characters without masking up to 2 so
let's fix this first sentence to be more clear like you suggested on slack.
> Source/JavaScriptCore/ChangeLog:12
> + We also remove m_table giving up. This is because m_table can be
used
"We also remove m_table giving up...." => "We also add support for m_table
because m_table can be used"
> Source/JavaScriptCore/ChangeLog:17
> + To make `\s` work on Char8 case, we use Char8 / Char16 information
> + effectively in BoyreMoore search bitmap construction.
how so? It'd be helpful to share more detail here, like, what makes it
"effective"
> Source/JavaScriptCore/yarr/YarrJIT.cpp:2414
> + auto [map, characters] =
op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex);
nit: "characters" is a generic name given this context. Similar suggestion to
naming the class. maybe "charactersFastPath" or something?
> Source/JavaScriptCore/yarr/YarrJIT.cpp:2429
> UChar32 character2 = 0xff;
> - if (mapCount == 2) {
> - character2 = map.findBit(character1 + 1,
true);
> - ASSERT(character2 !=
BoyerMooreBitmap::Map::size());
> - }
> - dataLogLnIf(Options::verboseRegExpCompilation(),
"Found 1-or-2 characters lookahead character:(0x", hex(character1),
"),character2:(", hex(character2), "),isMaskEffective:(",
isMaskEffective,"),range:[", beginIndex, ", ", endIndex, ")");
> + if (characters.size() > 1)
> + character2 = characters.at(1);
> + dataLogLnIf(Options::verboseRegExpCompilation(),
"Found ", characters.size(), " characters lookahead character:(0x",
hex(character1), "),character2:(", hex(character2), "),range:[", beginIndex, ",
", endIndex, ")");
Can all of the "character2", "characters.size() > 1" just be moved into
"characters.size() > 1" branch below? I think that'd make things a lot more
clear. No need to set to 0xff, etc.
> Source/JavaScriptCore/yarr/YarrJIT.h:63
> +class BoyerMooreCharacterCandidates {
this name feels weirdly generic. I wonder if we can go with
"BoyerMooreFastCase" or "BoyerMooreFastCandidates" or something similar that's
just a bit more descriptive of what we're aiming for? Since this isn't really a
comprehensive list of "character candidates".
More information about the webkit-reviews
mailing list