[Webkit-unassigned] [Bug 38142] RegExp caching

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue May 18 20:31:34 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=38142


Geoffrey Garen <ggaren at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #56362|review?                     |review-
               Flag|                            |




--- Comment #15 from Geoffrey Garen <ggaren at apple.com>  2010-05-18 20:31:32 PST ---
(From update of attachment 56362)
+    RefPtr<RegExp> cachedRegExp;
+
+    if (!flags.isNull())
+        cachedRegExp = RegExp::create(m_globalData, patternString, flags);
+    else
+        cachedRegExp = RegExp::create(m_globalData, patternString);

This code would be much simpler if you changed RegExp::create to always take a flags argument, but do something efficient when it is null. That might be a good thing to do in a future patch.


+    if (patternString.size() < maxCacheablePatternLength) {
+        ++m_ptr;
+        if (m_ptr >= maxCacheableEntries) {
+            m_ptr = 0;
+            m_isFull = true;
+        }
+        if (m_isFull)
+            m_cacheMap.remove(RegExpKey(patternKeyArray[m_ptr].flagsValue, patternKeyArray[m_ptr].pattern));
+
+        RegExpKey key = RegExpKey(flags, patternString);
+        m_cacheMap.set(key, cachedRegExp);
+        patternKeyArray[m_ptr].flagsValue = key.flagsValue;
+        patternKeyArray[m_ptr].pattern = patternString.rep();
+    }
+    return cachedRegExp;

The WebKit style for code like this is to check for the failure case and return early. One advantage of this style is that less code ends up indented. So:

if (patternString.size() >= maxCacheablePatternLength)
    return cachedRegExp;

I wouldn't call the regexp 'cachedRegExp', nor would I call the function 'createAndCache', since the regexp might not be cached. How about "regeExp" and "create" instead?

You define m_isFull as a data member, but only use it locally in createAndCache. I think you can get rid of it entirely, and just rely on "if (m_ptr >= maxCacheableEntries)" in createAndCache.

lookupOrCreate and createAndCache combine to do more hash lookups and RegExpKey construction than necessary. Here's what I'd recommend:

1. lookupOrCreate should say "pair <iterator, bool> result = m_cacheMap.add(RegExpKey(flags, patternString), 0)"
2. if result.second is false, there was a pre-existing entry in the map. You can return it.
3. if result.second is true, you've added a 0 to the map, and result.first is an iterator pointing at that zero. You can now use that iterator to add a real value to the map.

I think that patternKeyArray is unnecessary complexity. I guess your goal is LRU eviction. However, that's not what you'll get. When the cache is full, you'll evict the oldest item, and whatever item happens to collide with the item you're trying to add. There's no guarantee that evicting the oldest item will solve your collision problem.

You could go to greater lengths to implement a strict LRU cache. But what I would recommend is just to evict upon collision. All this would require is changing the call to add() that I suggested into a call to set() in the case where "m_cacheMap.size() > maxCacheableEntries".

r- because I think there are still good improvements to be made here.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list