[webkit-reviews] review granted: [Bug 25165] First step towards replacing PCRE and completing regular expression JIT. : [Attachment 29442] Yarr!

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Apr 13 17:30:23 PDT 2009

Geoffrey Garen <ggaren at apple.com> has granted 's request for review:
Bug 25165: First step towards replacing PCRE and completing regular expression

Attachment 29442: Yarr!

------- Additional Comments from Geoffrey Garen <ggaren at apple.com>
I'm ignoring commented-out code, because Gavin told me that he had a local
change to remove it.

Let's rename the NARC #define to YARR_JIT.


You can reduce the #ifdef'ing if you leave in ~RegExp in all builds, and just
leave out the implementation in YARR builds.

+    jitCompileRegex(globalData, m_regExpJITCode, m_pattern, m_numSubpatterns,
m_constructionError, ignoreCase(), multiline());
+    UNUSED_PARAM(globalData);
+    m_regExpBytecode.set(Yarr::byteCompileRegex(m_pattern, m_numSubpatterns,
m_constructionError, ignoreCase(), multiline()));

Kind of strange that the JIT version of the code does not use the Yarr
namespace explicitly, but the bytecode version of the code does.

+CharacterClass* newlineOnce()
+    static CharacterClass* once = newlineCreate();
+    return once;

DANGER WILL ROBINSON. DANGER. Static initialization like this is not
thread-safe. If we want to dynamically allocate these character classes, we
need to store them in the parser, or the JSGlobalData.

I think function names like this would be easier to read as
"newlineCharacterClass()", etc., instead of "newlineOnce()", etc.

+    // If the pattern contains illegal backreferences reset & reparse.
+    if (pattern.containsIllegalBackReference()) {

Let's be more explicit and mention that we're doing this to match a quirk in
Firefox's behavior.

+    DisjunctionContext* allocDisjunctionContext(ByteDisjunction* disjunction)
+    {
+	 return new(malloc(sizeof(DisjunctionContext) +
(disjunction->m_frameSize - 1) * sizeof(uintptr_t))) DisjunctionContext();
+    }
+    void freeDisjunctionContext(DisjunctionContext* context)
+    {
+	 free(context);
+    }

I think this code, and its client, would be cleaner if you made
DisjunctionContext refcounted, and used RefPtr to manage its lifetime

+    BytecodePattern* compile()
+    {
+	 regexBegin(m_pattern.m_body->m_callFrameSize);
+	 emitDisjunction(m_pattern.m_body);
+	 regexEnd();
+	 BytecodePattern* bytecode = new BytecodePattern(bodyDisjunction,
m_pattern.m_ignoreCase, m_pattern.m_multiline, m_pattern.m_numSubpatterns,
m_allParenthesesInfo, m_pattern.m_userCharacterClasses);
+	 bodyDisjunction = 0;
+	 m_allParenthesesInfo.shrink(0);
+	 m_pattern.m_userCharacterClasses.clear();
+	 return bytecode;
+    }

RefPtr here would be good, too.

+    ~BytecodePattern()
+    {
+	 delete m_body;
+	 for (unsigned i = 0; i < m_allParenthesesInfo.size(); ++i)
+	     delete m_allParenthesesInfo[i];
+	 for (unsigned i = 0; i < m_userCharacterClasses.size(); ++i)
+	     delete m_userCharacterClasses[i];
+    }

And here.

+struct RegexJITObject {
+    void* m_jitCode;
+    RefPtr<ExecutablePool> m_executablePool;
+    RegexJITObject()
+	 : m_jitCode(0)
+    {
+    }

Should we call this something more specific, like Yarr::CodeBlock? "Object" is
a pretty generic name for an object.

+// Yarr::parse():

Please use "/**/" syntax for multiline comments.

+    static const unsigned s_maxPatternSize = (1 << 13);

I think you want to standardize with the use of MAX_PATTERN_SIZE elsewhere, to
avoid differences between the engines.


More information about the webkit-reviews mailing list