[webkit-reviews] review granted: [Bug 228810] [JSC] Yarr's character tracking for BoyerMoore search should be more precise : [Attachment 434965] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Aug 5 15:14:32 PDT 2021


Saam Barati <sbarati at apple.com> has granted Yusuke Suzuki <ysuzuki at apple.com>'s
request for review:
Bug 228810: [JSC] Yarr's character tracking for BoyerMoore search should be
more precise
https://bugs.webkit.org/show_bug.cgi?id=228810

Attachment 434965: Patch

https://bugs.webkit.org/attachment.cgi?id=434965&action=review




--- Comment #3 from Saam Barati <sbarati at apple.com> ---
Comment on attachment 434965
  --> https://bugs.webkit.org/attachment.cgi?id=434965
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434965&action=review

r=me

> Source/JavaScriptCore/ChangeLog:8
> +	   We should track candidate characters without masking up to 2 so

let's fix this first sentence to be more clear like you suggested on slack.

> Source/JavaScriptCore/ChangeLog:12
> +	   We also remove m_table giving up. This is because m_table can be
used

"We also remove m_table giving up...." => "We also add support for m_table
because m_table can be used"

> Source/JavaScriptCore/ChangeLog:17
> +	   To make `\s` work on Char8 case, we use Char8 / Char16 information
> +	   effectively in BoyreMoore search bitmap construction.

how so? It'd be helpful to share more detail here, like, what makes it
"effective"

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2414
> +			   auto [map, characters] =
op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex);

nit: "characters" is a generic name given this context. Similar suggestion to
naming the class. maybe "charactersFastPath" or something?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2429
>			       UChar32 character2 = 0xff;
> -			       if (mapCount == 2) {
> -				   character2 = map.findBit(character1 + 1,
true);
> -				   ASSERT(character2 !=
BoyerMooreBitmap::Map::size());
> -			       }
> -			       dataLogLnIf(Options::verboseRegExpCompilation(),
"Found 1-or-2 characters lookahead character:(0x", hex(character1),
"),character2:(", hex(character2), "),isMaskEffective:(",
isMaskEffective,"),range:[", beginIndex, ", ", endIndex, ")");
> +			       if (characters.size() > 1)
> +				   character2 = characters.at(1);
> +			       dataLogLnIf(Options::verboseRegExpCompilation(),
"Found ", characters.size(), " characters lookahead character:(0x",
hex(character1), "),character2:(", hex(character2), "),range:[", beginIndex, ",
", endIndex, ")");

Can all of the "character2", "characters.size() > 1" just be moved into
"characters.size() > 1" branch below? I think that'd make things a lot more
clear. No need to set to 0xff, etc.

> Source/JavaScriptCore/yarr/YarrJIT.h:63
> +class BoyerMooreCharacterCandidates {

this name feels weirdly generic. I wonder if we can go with
"BoyerMooreFastCase" or "BoyerMooreFastCandidates" or something similar that's
just a bit more descriptive of what we're aiming for? Since this isn't really a
comprehensive list of "character candidates".


More information about the webkit-reviews mailing list