[webkit-reviews] review denied: [Bug 105692] Element boundaries prevent Japanese line break opportunities : [Attachment 193924] Patch

Wed Mar 20 10:44:15 PDT 2013

Darin Adler <darin at apple.com> has denied Glenn Adams <glenn at skynav.com>'s
request for review:
Bug 105692: Element boundaries prevent Japanese line break opportunities
https://bugs.webkit.org/show_bug.cgi?id=105692

Attachment 193924: Patch
https://bugs.webkit.org/attachment.cgi?id=193924&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=193924&action=review

Thanks for tackling this. A lot of the new code here isn’t following WebKit
coding style and it should.

I’m concerned about the level of test coverage for this significant amount
code. I am pretty sure I spotted an infinite loop in the code, not sure why it
was not a problem in practice.

> Source/WebCore/ChangeLog:28
> +	   (WebCore):

It’s best to remove bogus lines like this that the script adds to change log
entries.

> Source/WebCore/platform/text/TextBreakIterator.h:57
> +    const int TextBreakPriorContextCapacity = 2;

I think this constant would be better as a private member of
LazyLineBreakIterator rather exposed at the top level as a public constant. It
has nothing to do with the interface to the iterator.

> Source/WebCore/platform/text/TextBreakIterator.h:83
> +    UChar lastCharacter() const { return m_priorContext[1]; }

Do we need an assertion about m_priorContextLength here?

> Source/WebCore/platform/text/TextBreakIterator.h:85
> +    const UChar* getPriorContext(int& priorContextLength)

This is used only within the class, so I suggest it be public. I’m also not
sure it needs to be a function. Having this logic in the get function might be
good.

> Source/WebCore/platform/text/TextBreakIterator.h:95
> +	   m_priorContextLength = last ? (secondToLast ? 2 : 1) : 0;

This seems a bit peculiar. We are storing a length, but it’s not additional
data; it’s just an indicator of which characters are zero. m_priorContextLength
should not be stored if it can be computed.

> Source/WebCore/platform/text/TextBreakIterator.h:103
> +    void resetPriorContext()

Renaming this to be more abstract isn’t all that useful. We still are storing
just two characters. It’s not all that helpful to pretend to be more general in
our naming.

> Source/WebCore/platform/text/TextBreakIterator.h:110
> -    TextBreakIterator* get()
> +    TextBreakIterator* get(int& priorContextLength)

This interface is confusing enough that I think we’ll need a comment.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:71
> +static inline int64_t uTextWithPriorContextGetNativeLength(UText* ut)
> +{
> +    return ut->a + ut->b;
> +}

Nothing here seems to be specific to “a UText with prior context” so the
function name seems peculiar. A shorter name would likely make the code easier
to read. Further, including the word “Get” in this name does not make clear
“this is the inlined one you want to call”. It’s just a nearly identical name
to the other function and it seems likely we’d call the wrong one.

Further, I don’t think that adding “with prior context” to the names of all the
functions is necessary. It’s really hard to parse those long names. There must
be a more concise way to put things.

WebKit coding style is to use words rather than abbreviations. I suggest the
word “text” rather than the letters “ut” as the name of the object pointer in
all these functions.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:77
> +// Generic native length function, works with both Latin1 and UC text
providers.
> +static int64_t uTextWithPriorContextNativeLength(UText* ut)
> +{
> +    return uTextWithPriorContextGetNativeLength(ut);
> +}

Here the long name is perhaps justified, since this function exists to be put
in the method table below.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:80
> -static UText* uTextLatin1Clone(UText* destination, const UText* source,
UBool deep, UErrorCode* status)
> +// Generic clone function, works with both Latin1 and UC text providers.
> +static UText * uTextWithPriorContextClone(UText* dest, const UText* src,
UBool deep, UErrorCode* status)

Why the formatting change to add a space before the "*" here? That’s not normal
for our project.

Why the abbreviation of destination and source? In WebKit we prefer words to
abbreviations.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:89
> +    void* extraNew = dest->pExtra;

What does “extra new” mean? Unclear name.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:93
> +    int sizeToCopy = src->sizeOfStruct;
> +    if (sizeToCopy > dest->sizeOfStruct)
> +	   sizeToCopy = dest->sizeOfStruct;

Would read more clearly with std::min instead of an if statement.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:98
> +    if (extraSize)
> +	   memcpy(dest->pExtra, src->pExtra, extraSize);

The if here is unneeded and I suggest omitting it.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:103
> +static int32_t uTextWithPriorContextExtract(UText* ut, int64_t start,
int64_t limit, UChar* dest, int32_t destCapacity, UErrorCode* pErrorCode)

The use of a p prefix to mean a pointer is not a WebKit standard style element,
so should not be done.

The name “ut” does not seem clear; a word would be better than a pile of
letters and more consistent with WebKit coding style.

Please don’t use the abbreviation dest; use the word destination instead.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:128
> +    int32_t copied = 0;

It’s normally best to name local variables with noun phrases rather than
fragments. The word “copied” sounds like it could be a boolean and it would be
clearer to call it charactersCopied or something along those lines.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129
> +    while (copied < length) {

This loop does not seem to increment “copied”, so I’d expect it to be an
infinite loop. How did you test?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:131
> +	   if (!ut->pFuncs->access(ut, start, TRUE))
> +	       ASSERT_NOT_REACHED();

Is continuing to run an ignoring the failure the most helpful behavior here for
release builds?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:133
> +	   int32_t ucAvailable = ut->chunkLength - ut->chunkOffset;
> +	   int32_t ucToCopy = ucAvailable < length ? ucAvailable : length;

The “uc” prefix on these local variable names is not the usual WebKit style.
It’s also best to name local variables with noun phrases rather than fragments.
“to copy” is not really so great.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:134
> +	   StringImpl::copyChars(&dest[copied], ut->chunkContents +
ut->chunkOffset, static_cast<unsigned>(ucToCopy));

Why the static_cast here? Normally we can pass an int32_t to an unsigned
without an explicit cast.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:146
> +// Generic map offset to native index function, works with both Latin1 and
UC text providers.

The comments like this one before each function don’t say much. I’d omit them.
It’s true that these functions are used twice, for both Latin-1 and Unicode,
but they’re not “generic” in any additional interesting sense, nor is it
helpful to repeat the function name in prose in a comment.

Also, I am not comfortable with the abbreviation “uc” for “Unicode” and I’d
prefer we keep the use of that to a minimum.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:157
> +    ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);

What guarantees this is true? We can assert it only if we know it will be true.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:238
> +	       ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);

Same question as elsewhere. What guarantees this?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:250
> +	       ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);

Same question as elsewhere. What guarantees this?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:291
> +static const struct UTextFuncs uTextLatin1WithPriorContextFuncs = 
> +{

In WebKit coding style the brace would go on the previous line.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:347
> +    ASSERT_UNUSED(forward, forward ? nativeIndex >= ut->b && nativeIndex <
ut->a + ut->b : nativeIndex > ut->b && nativeIndex <= ut->a + ut->b);

We don’t normally do assertions with && in them. I suggest breaking this out
into two assertions, one about the start and one about the end.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:350
> +    ASSERT(ut->chunkNativeLimit - ut->chunkNativeStart < INT32_MAX);

What guarantees this? We can’t just assert because we hope it’s true.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:353
> +    ASSERT(nativeIndex - ut->chunkNativeStart < INT32_MAX);

What guarantees this? We can’t just assert because we hope it’s true.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:372
> +    ASSERT(nativeIndex < INT32_MAX);

What guarantees this? We can’t just assert because we hope it’s true.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:486
> +    UText uTextUCWithPriorContextLocal = UTEXT_INITIALIZER;

This long name does not seem helpful. Why does the phrase “with prior context”
need to appear in the variable name? Why not just call this textLocalStorage or
something like that?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:488
> +    UErrorCode uTextOpenStatus = U_ZERO_ERROR;

Why not just call this openStatus?

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:489
> +    UText* uTextUCWithPriorContext =
UTextOpenUTF16(&uTextUCWithPriorContextLocal, string, length, priorContext,
priorContextLength, &uTextOpenStatus);

Why not just call this text?