[Webkit-unassigned] [Bug 41238] RegExp performance slow on Dromaeo benchmark

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Jun 28 15:20:38 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=41238


Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #59949|review?                     |review-
               Flag|                            |




--- Comment #4 from Darin Adler <darin at apple.com>  2010-06-28 15:20:38 PST ---
(From update of attachment 59949)
This looks great!

> +    , m_lastMatchStr(UString::null())

This shouldn't be needed. A UString will start out null.

> +    , m_lastMatchStart(-1)
>  {
> +    m_lastOVector.resize(0);

This shouldn't be needed. The vector should start out empty.

> +    , m_lastMatchStr(UString::null())

This shouldn't be needed. A UString will start out null.

> +    , m_lastMatchStart(-1)
>  {
> +    m_lastOVector.resize(0);

This shouldn't be needed. The vector should start out empty.

> +        m_lastMatchStr = UString::null();

I think this should just be:

    m_lastMatchStr = UString();

You should check which is more efficient, but I think my form is.

> +        m_lastOVector.resize(0);

It's more efficient to call shrink(0) than resize(0).

> +    // Check to see if this match call is the same as the last match invocation

We normally use sentence style for comments, with a period.

> +    if ((startOffset == m_lastMatchStart) && (s.rep() == m_lastMatchStr.rep())) {

We normally

> +        if (ovector) {
> +            (Vector<int, 32>&)*ovector = (Vector<int, 32>&)m_lastOVector;
> +        }

I don't understand why you need these typecasts. They should not be needed.

Also, we don't use braces around single line if statement bodies. It should be:

    if (ovector)
        *ovector = m_lastOVector;

> +        if (m_lastOVector.isEmpty())
> +            return -1;
> +        else
> +            return m_lastOVector.at(0);

We normally don't do else after return, so the else should be removed.

> +        if (ovector)
> +            (Vector<int, 32>&)m_lastOVector = (Vector<int, 32>&)*ovector;
> +        else
> +            (Vector<int, 32>&)m_lastOVector = (Vector<int, 32>&)nonReturnedOvector;

Same comments about typecasts.

>          unsigned m_numSubpatterns;
> +        UString m_lastMatchStr;

We normally try to avoid abbreviations. So I suggest you either call this m_lastMatchString or m_lastMatch and leave out the "str".

>      ALWAYS_INLINE bool operator==(const UString& s1, const UString& s2)
>      {
>          unsigned size = s1.size();
> +        
> +        if (size != s2.size())      // Sizes not the same, we're done.
> +            return false;
> +            
> +        if (s1.data() == s2.data()) // Quick check of data pointers being the same.
> +            return true;

Another way to do this optimization is to check if s1.rep() is == s2.rep(). That would be more efficient than the repeated calls to data() this function has. And I think it should be relatively rare to have the two data pointers to match if they are not the exact same rep. You could put the rep check before the size check, in fact.

review- because I think you should do at least one of the things I suggested above

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list