[webkit-reviews] review denied: [Bug 105692] Element boundaries prevent Japanese line break opportunities : [Attachment 195488] Patch

Tue Apr 2 10:36:55 PDT 2013

Darin Adler <darin at apple.com> has denied Glenn Adams <glenn at skynav.com>'s
request for review:
Bug 105692: Element boundaries prevent Japanese line break opportunities
https://bugs.webkit.org/show_bug.cgi?id=105692

Attachment 195488: Patch
https://bugs.webkit.org/attachment.cgi?id=195488&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=195488&action=review

Generally looks great.

review- because of the problem in acquireLineBreakIterator.

> Source/WebCore/ChangeLog:37
> +	   (LazyLineBreakIterator):

Ideally, you should remove bogus lines like this that are added because of bugs
in the prepare-ChangeLog script.

Even better, you could get someone to fix that script, or do it yourself!

> Source/WebCore/platform/text/TextBreakIterator.h:82
> +    unsigned getPriorContextLength() const

This function should be named priorContextLength in WebKit coding style. We
reserve the word “get” for functions that return results in some other way.

> Source/WebCore/platform/text/TextBreakIterator.h:85
> +	   ASSERT(sizeof(m_priorContext) / sizeof(m_priorContext[0]) >= 2);

These should be a COMPILE_ASSERT rather than just an ASSERT.

WTF has a macro for array length called WTF_ARRAY_LENGTH that should be used
instead of writing out the math with two separate sizeofs.

This should be asserting that it’s == 2, not just >= 2, since this function’s
implementation assumes there are exactly 2 elements. Generally speaking, all
the functions seem specific to a 2-element, array, so the assertions don’t seem
quite right. For example, this one looks only at two characters, and others
assume that the last character is at index 1 and the second to last at index 0.
So all these functions are assuming 2 characters of context, not >= 2 or >= 1.

> Source/WebCore/platform/text/TextBreakIterator.h:149
> +	   // N.B. prior context is explicitly not reset here since it needs to
be accumulated and retained
> +	   // across multiple inline nodes, each of which cause this reset() to
be invoked in order
> +	   // to update the primary context string.

This comment makes it sound like we should consider renaming the reset()
function for clarity about what it does and does not reset.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:87
> +	   const void* pointerSource = static_cast<const void*>(source);

It’s strange to first cast these to void* and then to char*. I suggest a
reinterpret_cast directly to char*. There’s no problem comparing a void* to a
char*, and the number of casts in this function might be cut down considerably.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:129
> +    ASSERT(errorCode);

Seems a bit excessive to assert this. We don’t have to assert every pointer as
non-null.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:130
> +    ASSERT_NOT_REACHED();

Might be worth commenting why this is correct.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:145
> +    return offset < INT32_MAX ? static_cast<int32_t>(offset) : 0;

I didn’t know that INT32_MAX existed, portably. Normally I’d use
numeric_limits<int32_t>::max().

Why is it OK to return 0 if the offset is too big to fit into 32 bits? A
comment should say why; it’s not obvious.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:187
> +    text->chunkLength = length < INT32_MAX ? static_cast<int32_t>(length) :
0;

Same comments as above.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:209
> +    text->chunkOffset = nativeIndex < INT32_MAX ?
static_cast<int32_t>(nativeIndex) : 0;

Same thing again.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:314
> +    UText* text = utext_setup(&utWithBuffer->text,
sizeof(utWithBuffer->buffer), status);
> +    if (U_SUCCESS(*status))
> +	   textInit(text, &textLatin1Funcs, string, length, priorContext,
priorContextLength);
> +    return text;

Is utext_setup guaranteed to return 0 if the status does not indicate success?
WebKit style is normally early return for errors, so we’d write:

    if (U_FAILURE(*status)) {
	ASSERT(!text);
	return 0;
    }

And continue with the call to textInit, rather than putting the normal case
inside an if statement.

> Source/WebCore/platform/text/TextBreakIteratorICU.cpp:419
> +    if (U_SUCCESS(*status))
> +	   textInit(text, &textUTF16Funcs, string, length, priorContext,
priorContextLength);
> +    return text;

Same comment as above.

> Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:263
> +	   Vector<UChar> utf16string(priorContextLength + length);

This isn’t a good name for the local variable. There’s an argument called
string that is also UTF-16, so calling this utf16string does nothing to make
clear why it’s different. Also, the capitalization does not match WebKit coding
style, which requires a capital S. I would call this stringWithPrependedContext
or some other sort name that makes clear what the difference is.

> Source/WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp:268
> +	   return acquireLineBreakIterator(utf16string.data(),
utf16string.size(), locale, 0, 0);

Having the function call itself like this doesn’t seem optimal to me.

Is it OK to have a line break iterator that outlives the data it’s pointed to?
When the local vector, utf16string, is deleted, why will the line break
iterator still work properly? I think this is a major problem!