[Webkit-unassigned] [Bug 38117] New: Differences between subpattern matching in use of pcre and Yarr Intrepreter

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Apr 26 05:36:12 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=38117

           Summary: Differences between subpattern matching in use of pcre
                    and Yarr Intrepreter
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: JavaScriptCore
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: pvarga at inf.u-szeged.hu
                CC: ggaren at apple.com, barraclough at apple.com,
                    zherczeg at webkit.org


I have found some cases when the Yarr JIT has done fallback to pcre, but the
result of matching of subpatterns is different between the pcre and Yarr
Interpreter cases.

example 1:

var pat = /(a(b)*)*/;
var str = "aba";
print(str.match(pat));

pcre's result: aba,a,b
yarr's result: aba,a,

In this case the pcre doesn't match the subpattern of 'b' at the second
iteration, because the matching algorithm reached the end of the input string.
Thus the first iteration's result remains in the output vector. But the yarr's
algorithm matches the subpattern of 'b' at the end of the input string again. 
I think according to the yarr algorithm the outern subpattern should match in
the
third iteration but in this case the yarr's result should be "aba,," because
the
subpattern of 'a' doesn't match at the third iteration either.


example 2:

var pat = /(a*)*/;
var str = "ab";
print(str.match(pat));

pcre's result: a,
yarr's result: a,a

In this case the situation is similar to the first example, but here a
character matching blocks the matching of the subpattern instead of reaching
the end
of the input string. Yarr stores the first matching of subpattern but pcre
doesn't.


example 3:

var pat = /([ab]*)*/;
var str = "abab";
print(str.match(pat));

pcre's result = abab,
yarr's result = abab,abab

IMHO the yarr's way is correct in this case, because pcre tries to match the
subpattern
character by character instead of one iteration.


Which is the correct behaviour in each example? Which regex engine needs a fix?

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list