[webkit-reviews] review denied: [Bug 32831] Replace UString::Rep implementation, following introduction of ropes to JSC. : [Attachment 45339] The patch

Mon Dec 21 11:51:15 PST 2009

Darin Adler <darin at apple.com> has denied Gavin Barraclough
<barraclough at apple.com>'s request for review:
Bug 32831: Replace UString::Rep implementation, following introduction of ropes
to JSC.
https://bugs.webkit.org/show_bug.cgi?id=32831

Attachment 45339: The patch
https://bugs.webkit.org/attachment.cgi?id=45339&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
It would have been nice to have smaller patches. For example, one patch could
have removed all the unneeded stuff without changing anything that was left
behind.

> +    UString::Rep m_base;

Can we start calling these UStringImpl in code we're modifying, instead of
using the typedef? I think a later patch should just eliminate "Rep" entirely.

>      // make sure UString doesn't try to reuse the buffer by pretending we
have one more character in it

This comment should go. You removed all the code that went with it.

> +UStringImpl* UStringImpl::nullUStringImpl;
> +UStringImpl* UStringImpl::emptyUStringImpl;

Can we give these names that don't repeat the class name? If you are using the
"s_" naming, s_null and s_empty would seem to do.

> +	   // UString constructors passed char*s assume 7-bit ASCII enciding;
for UTF8 use 'createFromUTF8', belwo.

Typo: "enciding".
Typo: "belwo".

The functions zero-extend the characters, so they don't assume ASCII encoding.
They do ISO Latin-1, not UTF-8. I think the comment could be worded more
precisely.

If the function assumed ASCII encoding then it would be illegal to pass a
non-ASCII character and we would assert that it was not done. Maybe we should
switch to that.

> -	   // Constructor for null-terminated ASCII string.
>	   UString(const char*);

A shame to lose the "null-terminated" aspect of this comment.

> Index: JavaScriptCore/runtime/UStringImpl.cpp
> ===================================================================
> --- JavaScriptCore/runtime/UStringImpl.cpp	(revision 0)
> +++ JavaScriptCore/runtime/UStringImpl.cpp	(revision 0)
> @@ -0,0 +1,176 @@
> +/*
> + *  Copyright (C) 1999-2000 Harri Porten (porten at kde.org)
> + *  Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights
reserved.
> + *  Copyright (C) 2007 Cameron Zwarich (cwzwarich at uwaterloo.ca)
> + *  Copyright (C) 2009 Google Inc. All rights reserved.

There may be too much copyright here.

It would be best, if this file is really a copied piece of UString.cpp, to do
this by using "svn copy" so we see the deletion of the non-moved part of the
file, and then the changes to the moved bit. The way you did it here, the code
all looks new and has no history. But yet the copyright indicates you think it
does have history. If the only part you kept was the hash function then I don't
think you need non-Apple copyrights. Maciej and I wrote the hash function.

> +#include "config.h"
> +#include "UStringImpl.h"
> +#include "Identifier.h"
> +#include "UString.h"
> +#include "UTF8.h"

The UStringImpl.h include should be in a separate paragraph by itself.

> +using namespace WTF;

This should not be needed. If it is, then the WTF headers are done wrong, I
believe.

> +COMPILE_ASSERT(sizeof(UChar) == 2, uchar_is_2_bytes);

What code below assumes this?

> +    ASSERT((bufferOwnership() == BufferShared) || ((bufferOwnership() ==
BufferOwned) && !m_dataBuffer.asPtr<void*>()));

I find it hard to read this assertion. Maybe too many parentheses?

> Index: JavaScriptCore/runtime/UStringImpl.h
> ===================================================================
> --- JavaScriptCore/runtime/UStringImpl.h	(revision 0)
> +++ JavaScriptCore/runtime/UStringImpl.h	(revision 0)
> @@ -0,0 +1,276 @@
> +/*
> + *  Copyright (C) 1999-2000 Harri Porten (porten at kde.org)
> + *  Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights
reserved.
> + *  Copyright (C) 2009 Google Inc. All rights reserved.

Same comments about copyright and "svn copy". Lets keep the appropriate amount
of history.

> +#include <stdint.h>

Why? Are these really used in the header?

> +#include <wtf/FastMalloc.h>

Not needed. We use Noncopyable and if we include that header, then FastMalloc.h
is already taken care of.

> +#include <limits.h>

Why? Don't we get everything we need from the C++ <limits>?

> +    UntypedPtrAndBitfield() {}

Uninitialized? Really? Maybe it should be zeroed instead? Or perhaps we don't
need it at all?

Does this code come from WebCore::StringImpl? If so, could we svn copy from
there?

> +    unsigned hash() const { if (m_hash == 0) m_hash = computeHash(data(),
m_length); return m_hash; }

Our coding style would be !m_hash.

> +    void setHash(unsigned hash) { ASSERT(hash == computeHash(data(),
m_length)); m_hash = hash; } // fast path for Identifiers

Do we really need a public function for this?

> +    UStringImpl* ref() { m_refCount+=2; return this; }
> +    ALWAYS_INLINE void deref() { if ((m_refCount-=2) == 0) destroy(); }

We normally put spaces around operators like "+=" and "-=".

You need a comment right here explaining why we bump the reference count by 2
instead of one.

Why does ref return UStringImpl* instead of void?

> +	   if (length > (std::numeric_limits<size_t>::max() / sizeof(UChar)))

Extra parentheses here make it harder for me to read.

> +#ifdef NDEBUG
> +    void checkConsistency() const {}
> +#else
> +    void checkConsistency() const
> +    {
> +	   // There is no recursion of substrings.
> +	   ASSERT(bufferOwnerString()->bufferOwnership() != BufferSubstring);
> +	   // Static strings cannot be put in identifier tables, because they
are globally shared.
> +	   ASSERT(!isStatic() || !identifierTable());
> +    }
> +#endif

Normally I prefer to write such ifdefs so there is only one declaration even if
there are two definitions. In fact, I suggest putting the debug version into
the cpp file since we normally don't inline in debug builds anyway. We'd just
do this:

    #ifdef NDEBUG
    inline UStringImpl::checkConsistency() const
    {
    }
    #endif

And keep the class definition tidy with no ifdefs in it at all.

> +    enum BufferOwnership {
> +	   BufferInternal = 0,
> +	   BufferOwned = 1,
> +	   BufferSubstring = 2,
> +	   BufferShared = 3,
> +    };

Is there a reason to specify the numbers for each of these?

> +    // For SmallStringStorage which allocates an array, and uses an in-place
new.

There should be a comma before the word which. And no comma before the word
and.

> +    // Used to construct normal strings with an internal or external buffer.

> +    UStringImpl(UChar* data, int length, BufferOwnership ownership)

I'd expect an assertion that matches the comment.

> +    // Used to construct static strings with an special refCount of 1 -
since refCount
> +    // is incremented/decremented by 2, this can never hit zero.  This means
that the
> +    // static string will never be destroyed.  This is important because
static strings
> +    // will be shared across threads & ref-counted in a non-threadsafe
manner.

We use one space after a period, not two. I suggest a named constant for the
special reference count of 1 so you don't have to put a comment on the line
where you use it.

> +#ifndef NDEBUG
> +    bool isStatic() const { return m_refCount & 1; }
> +#endif

Since this is already private, maybe it doesn't have to be debug only.

> +    int m_refCount;

Since the code relies on overflow semantics for the special reference count,
this should be unsigned, which has defined behavior on overflow, rather than
int, which does not.

I'll say review-, but this looks really good. You can probably land after
making a few fixes.