[webkit-reviews] review denied: [Bug 27032] document.title does not replace or remove space characters : [Attachment 43500] Second version of patch

Thu Nov 19 10:51:27 PST 2009

Darin Adler <darin at apple.com> has denied Christian Sejersen
<christian.webkit at gmail.com>'s request for review:
Bug 27032: document.title does not replace or remove space characters
https://bugs.webkit.org/show_bug.cgi?id=27032

Attachment 43500: Second version of patch
https://bugs.webkit.org/attachment.cgi?id=43500&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
Thanks for tackling this. This patch combines refactoring and at least three
different behavior changes. We should ideally do these separately and each
behavior change needs either a test or a justification about why it can't be
tested.

> +	Unify the behaviour of stripping whitespace from the title of the 
> +	window and when using document.title. So in cases where whitespace had 

> +	to be removed isSpaceOrNewline am now taking into account Unicode
separator 
> +	characters. 

Can't land patches with tabs. There are quite a few tabs in the patch.

> +	   * dom/Document.cpp:
> +	   (WebCore::canonicalizedTitle):
> +	   (WebCore::Document::updateTitle):
> +	   * dom/Document.h:
> +	   (WebCore::Document::title):
> +	   * loader/DocumentLoader.cpp:
> +	   (WebCore::DocumentLoader::setTitle):
> +	   * platform/text/StringImpl.h:
> +	   (WebCore::isSpaceOrNewline):

It's better to have per-function comments. It seems I am one of the few people
still doing this. You can win points with me by doing it, though!

> +/*
> + * Performs three operations:
> + *  1. Trim leading and trailing whitespace
> + *  2. Collapse internal whitespace.
> +*/

We use "//" comments almost all the time, not "/* */" except in rare cases.

I'm also not sure this is an important comment. Comments should usually state
"why", not "what". The "what" part should be expressed in the function names
and code itself.

> +static inline String canonicalizedTitle(const String& title)
> +{
> +    //ASSERT(!title.isEmpty());

Assertion should either be removed or added, not added in commented-out.

> +	   
> +    String canonicalTitle = title.stripWhiteSpace();
> +    canonicalTitle = canonicalTitle.simplifyWhiteSpace();
> +	   
> +    return canonicalTitle;
> +}

Is there a a good reason to not write this one one line?

    return title.stripWhiteSpace().simplifyWhiteSpace();

> +    m_canonicalizedTitle = canonicalizedTitle(m_title);

It seems to me that if we store the canonicalized title, then we should check
if it changes, and avoid the setTitle call when the title has not changed.

> -    String title() const { return m_title; }
> +    String title() const { return m_canonicalizedTitle; }

This is a bad situation. We do not want a class where there is a data member
named m_title and a function named title, and they are not the same thing.

We need to adjust the names for clarity. Do we even need an m_title data member
at all? Maybe m_canonicalizedTitle should be the only thing we store. Who uses
m_title?

Maybe m_title should stay and change purpose.

> -/*
> - * Performs four operations:
> - *  1. Convert backslashes to currency symbols
> - *  2. Convert control characters to spaces
> - *  3. Trim leading and trailing spaces
> - *  4. Collapse internal whitespace.
> - */

You eliminated conversion of of control characters to spaces. Why are you sure
that's OK? I'd like to see a separate patch for that behavior change which does
not combine that change with all the refactoring.

> -static inline String canonicalizedTitle(const String& title, Frame* frame)
> -{
> -    ASSERT(!title.isEmpty());
> -
> -    const UChar* characters = title.characters();
> -    unsigned length = title.length();
> -    unsigned i;
> -
> -    StringBuffer buffer(length);
> -    unsigned builderIndex = 0;
> -
> -    // Skip leading spaces and leading characters that would convert to
spaces
> -    for (i = 0; i < length; ++i) {
> -	   UChar c = characters[i];
> -	   if (!(c <= 0x20 || c == 0x7F))
> -	       break;
> -    }
> -
> -    if (i == length)
> -	   return "";
> -
> -    // Replace control characters with spaces, and backslashes with currency
symbols, and collapse whitespace.
> -    bool previousCharWasWS = false;
> -    for (; i < length; ++i) {
> -	   UChar c = characters[i];
> -	   if (c <= 0x20 || c == 0x7F || (WTF::Unicode::category(c) &
(WTF::Unicode::Separator_Line | WTF::Unicode::Separator_Paragraph))) {
> -	       if (previousCharWasWS)
> -		   continue;
> -	       buffer[builderIndex++] = ' ';
> -	       previousCharWasWS = true;
> -	   } else {
> -	       buffer[builderIndex++] = c;
> -	       previousCharWasWS = false;
> -	   }
> -    }
> -
> -    // Strip trailing spaces
> -    while (builderIndex > 0) {
> -	   --builderIndex;
> -	   if (buffer[builderIndex] != ' ')
> -	       break;
> -    }
> -
> -    if (!builderIndex && buffer[builderIndex] == ' ')
> -	   return "";
> -
> -    buffer.shrink(builderIndex + 1);
> -    
> -    // Replace the backslashes with currency symbols if the encoding
requires it.
> -    frame->document()->displayBufferModifiedByEncoding(buffer.characters(),
buffer.length());
> -
> -    return String::adopt(buffer);
> -}

This is a hot code path, and one of the reasons the function was written the
way it was is for performance. Making your change is OK if we know the
performance has not changed. Have you found a way to determine this does not
slow down performance?

> -    String trimmed = canonicalizedTitle(title, m_frame);
> -    if (!trimmed.isEmpty() && m_pageTitle != trimmed) {
> +    if (!title.isEmpty() && m_pageTitle != title) {
>	   frameLoader()->willChangeTitle(this);
> -	   m_pageTitle = trimmed;
> +	   // Replace the backslashes with currency symbols if the encoding
requires it.
> +	   m_pageTitle =
m_frame->document()->displayStringModifiedByEncoding(title);
>	   frameLoader()->didChangeTitle(this);
>      }

This changes behavior for titles that would be empty if made canonical. Is that
OK? Did you test behavior in other browsers? I'd prefer to have the behavior
change in a separate patch from the refactoring for clarity, where the change
is clearly justified and tested.

>  static inline bool isSpaceOrNewline(UChar c)
>  {
> -    // Use isASCIISpace() for basic Latin-1.
> -    // This will include newlines, which aren't included in Unicode DirWS.
> -    return c <= 0x7F ? WTF::isASCIISpace(c) : WTF::Unicode::direction(c) ==
WTF::Unicode::WhiteSpaceNeutral;
> +    // Use isASCIISpace() for ASCII
> +    if (c <= 0x7F)
> +	   return WTF::isASCIISpace(c);
> +    
> +    // Unicode characters in the space, line or paragraph separator
> +    // category are treated as white space
> +    return ((WTF::Unicode::category(c) & 
> +		(WTF::Unicode::Separator_Space | 
> +		 WTF::Unicode::Separator_Line | 
> +		 WTF::Unicode::Separator_Paragraph)));
>  }

What's the motivation for this change? Is this about correctness? If so,
where's the test case that shows the old behavior was incorrect? Is it about
efficiency?

The formatting seems inferior to the existing to me and doesn't follow our
rules about where to put boolean operators in multiple lines.

One way to make these more readable is to use "using namespace" inside the
function.