[webkit-reviews] review granted: [Bug 107337] BackgroundHTMLParser should be able to atomize well-known strings : [Attachment 192152] Patch

Fri Mar 8 11:08:48 PST 2013

Adam Barth <abarth at webkit.org> has granted Eric Seidel <eric at webkit.org>'s
request for review:
Bug 107337: BackgroundHTMLParser should be able to atomize well-known strings
https://bugs.webkit.org/show_bug.cgi?id=107337

Attachment 192152: Patch
https://bugs.webkit.org/attachment.cgi?id=192152&action=review

------- Additional Comments from Adam Barth <abarth at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=192152&action=review

> Source/WebCore/ChangeLog:20
> +	   median= 443.726500002 ms, stdev= 7.25002679952 ms, min=
430.244000047 ms, max= 455.511000007 ms
> +	   to:
> +	   median= 427.849500004 ms, stdev= 9.96967058292 ms, min=
417.914000049 ms, max= 461.528000014 ms

You're reporting the median, but not the mean.	We usually work with the mean.

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:125
> +    // FIXME: Using CaseFoldingHash::equal instead of equalIgnoringCase, as
equalIgnoringCase
> +    // wants non-const StringImpl* (even though it never modifies them).

Maybe we should change equalIgnoringCase to use a const StringImpl* ?

> Source/WebCore/html/parser/HTMLIdentifier.cpp:49
> +    return table;

Should we ASSERT(isMainThread() || !table.isEmpty()) ?

> Source/WebCore/html/parser/HTMLIdentifier.cpp:126
> +	   // We expect some hash collisions, but only for identical strings.
> +	   // Since all of these names are AtomicStrings pointers should be
equal.
> +	   ASSERT_UNUSED(addResult, !addResult.isNewEntry || name ==
addResult.iterator->value.second);

Is this just dumb luck?  What happens when we happen to have two HTML tag names
with the same hash value?  We should at least have a comment here that explain
what to do when that happens.

> Source/WebCore/html/parser/HTMLIdentifier.cpp:133
> +    // This is safe to call from any thread, but must be called before
> +    // using HTMLIdentifier.

I would just ASSERT(isMainThread()) since that's easier and how we plan to use
it anyway.

> Source/WebCore/html/parser/HTMLIdentifier.h:51
> +	   if (vector.size() <= maxNameLength) {

Why not put this branch into findIndex?  That would simply this code
dramatically.

> Source/WebCore/html/parser/HTMLIdentifier.h:62
> +	   if (width == Likely8Bit)
> +	       m_string = StringImpl::create8BitIfPossible(vector);
> +	   else if (width == Force8Bit)
> +	       m_string = String::make8BitFrom16BitSource(vector);
> +	   else
> +	       m_string = String(vector);

This work should probably be done by String.  It's useful in other places.

After these two changes, this function just becomes:

m_index = findIndex(vector.data(), vector.size());
if (m_index != invalidIndex)
    return;
m_string = String(vector, width);

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:286
> +    // Note: This hash compare is not correct for all possible StringImpls,
but
> +    // since we're always comparing against constants, it's OK.

Why is it not correct?	How can two strings be equal if their hashes aren't
equal?