[Webkit-unassigned] [Bug 38117] New: Differences between subpattern matching in use of pcre and Yarr Intrepreter
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Apr 26 05:36:12 PDT 2010
https://bugs.webkit.org/show_bug.cgi?id=38117
Summary: Differences between subpattern matching in use of pcre
and Yarr Intrepreter
Product: WebKit
Version: 528+ (Nightly build)
Platform: All
OS/Version: All
Status: UNCONFIRMED
Severity: Normal
Priority: P2
Component: JavaScriptCore
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: pvarga at inf.u-szeged.hu
CC: ggaren at apple.com, barraclough at apple.com,
zherczeg at webkit.org
I have found some cases when the Yarr JIT has done fallback to pcre, but the
result of matching of subpatterns is different between the pcre and Yarr
Interpreter cases.
example 1:
var pat = /(a(b)*)*/;
var str = "aba";
print(str.match(pat));
pcre's result: aba,a,b
yarr's result: aba,a,
In this case the pcre doesn't match the subpattern of 'b' at the second
iteration, because the matching algorithm reached the end of the input string.
Thus the first iteration's result remains in the output vector. But the yarr's
algorithm matches the subpattern of 'b' at the end of the input string again.
I think according to the yarr algorithm the outern subpattern should match in
the
third iteration but in this case the yarr's result should be "aba,," because
the
subpattern of 'a' doesn't match at the third iteration either.
example 2:
var pat = /(a*)*/;
var str = "ab";
print(str.match(pat));
pcre's result: a,
yarr's result: a,a
In this case the situation is similar to the first example, but here a
character matching blocks the matching of the subpattern instead of reaching
the end
of the input string. Yarr stores the first matching of subpattern but pcre
doesn't.
example 3:
var pat = /([ab]*)*/;
var str = "abab";
print(str.match(pat));
pcre's result = abab,
yarr's result = abab,abab
IMHO the yarr's way is correct in this case, because pcre tries to match the
subpattern
character by character instead of one iteration.
Which is the correct behaviour in each example? Which regex engine needs a fix?
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list