[webkit-reviews] review denied: [Bug 65803] Initial pass at a new XML tree builder : [Attachment 103292] Patch

Mon Aug 8 15:52:11 PDT 2011

Adam Barth <abarth at webkit.org> has denied Jeffrey Pfau <jeffrey at endrift.com>'s
request for review:
Bug 65803: Initial pass at a new XML tree builder
https://bugs.webkit.org/show_bug.cgi?id=65803

Attachment 103292: Patch
https://bugs.webkit.org/attachment.cgi?id=103292&action=review

------- Additional Comments from Adam Barth <abarth at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=103292&action=review

Lots of detailed comments.  Mostly r- for manual refcounting.

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:87
> +    m_finishWasCalled = true;

I thought you were going to ASSERT(!m_finishWasCalled) here?

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:118
>  void NewXMLDocumentParser::prepareToStopParsing()
>  {
> +    ScriptableDocumentParser::prepareToStopParsing();
>  }
>  
>  void NewXMLDocumentParser::stopParsing()
>  {
> +    ScriptableDocumentParser::stopParsing();
>  }

Why do these exist if we're just going to call through to the base class?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:82
> +    default:
> +	       break;

Bad indent.  Usually we skip the default case to have the compiler yell at us
if we forget any cases.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:91
> +    if (node != m_document)
> +	   node->ref();

Why do we need to use manual ref counting?  Can't we use RefPtr to do this work
automatically?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:95
> +    m_currentNode = node;

How does a currentNode differ from a currentNodeItem?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:102
> +    if (!m_currentNode)
> +	   return;

When would we ever pop the current node when m_currentNode is null?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:106
> +    if (m_currentNode != m_document)
> +	   m_currentNode->deref();

Frown.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:127
> +    m_document->setXMLVersion(String::adopt(token.xmlVersion()), ec);

We should use the copying string constructor rather than adopt here.  The
copying constructor is faster for small strings because adopt incurs an extra
malloc.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:133
> +    if (ec)
> +	   m_parser->stopParsing();

Do we need to check for ec after each call?  What if the second call clobbers
the ec from the first?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:148
> +    AtomicString publicIdentifier(token.publicIdentifier().data(),
token.publicIdentifier().size());
> +    AtomicString systemIdentifier(token.systemIdentifier().data(),
token.systemIdentifier().size());

The design for the other fields is to have the AtomicXMLToken create these
atomic strings on construction.  We'll need to do the work either way, but
doing it that way lets us save work if we need to look at these values in
multiple places.  That's probably not much of a win here, but it's useful to be
consistent.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:160
> +	   || (publicIdentifier == xhtmlMobile)
> +	  )

These lines should be merged.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:166
> +    exitText();

Why don't the processing instruction or the xml declaration cause use to
exitText?  Do we need to assert that there's not text to exit?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:214
> +

Extra blank line.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:215
> +    enterText(String(token.characters().data(), token.characters().size()));

So, you've got an extra memcpy here.  You're memcpying data into this string,
and then you'll end up memcpying into an existing text node, if there is one. 
That turned out to be costly in the HTML parser, which is why we came up with
the notion of an external character buffer.  You want to avoid memcpying the
data into the String until as late as possible.  Doing the memcpy here is
probably too early.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:220
> +    RefPtr<CDATASection> cdata = CDATASection::create(m_document,
token.data());

No exitText?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:231
> +    m_currentNode->parserAddChild(comment.get());
> +    if (!comment->attached())
> +	   comment->attach();

This stanza is repeated all over the place.  Maybe it should be factored out?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:236
> +    // FIXME: support internal subset

Comments should be complete sentences.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:246
> +    if (token.attributes()) {

Prefer early exit.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:247
> +	   for (unsigned i = 0; i < token.attributes()->length(); ++i) {

unsigned => size_t

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:260
> +    if (token.attributes()) {

Prefer early exit.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:261
> +	   for (unsigned i = 0; i < token.attributes()->length(); ++i) {

unsigned => size_t

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:262
> +	       Attribute* attribute = token.attributes()->attributeItem(i);

It seems wasteful to iterate over the attributes twice.  Can we do the work in
a single pass over the attributes?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:265
> +		   newElement->setAttributeNS(XMLNSNames::xmlnsNamespaceURI,
"xmlns:" + attribute->name().localName(), attribute->value(), ec);

Why is there an ec parameter here?  Do we need to handle exceptions?  Can this
run JavaScript synchronously (e.g., in a DOM mutation event)?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:295
> +	   m_leafTextNode->appendData("\"", ec);

Can these trigger synchronous execution of JavaScript?	In what cases could an
ec be generated?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:309
> +    for (unsigned i = 0; i < name.length(); ++i) {

unsigned => size_t

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:318
> +	  enterText(String(reinterpret_cast<UChar*>(&entityValue), 1));

Yuck.  That's an extra malloc.	That's going to kill performance if there's a
long string of entities.  This is related to enterText requiring us to copy
into a String too early.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:321
> +	   UChar utf16Pair[2] = { U16_LEAD(entityValue), U16_TRAIL(entityValue)
};
> +	   enterText(String(utf16Pair, 2));

Same problem here.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:337
> +	   m_leafTextNode->appendData(text, ec);

Is this going to involve copying the whole string over and over again?	We
might want to use something like StringBuilder to receive all these string
fragments and the actually create the text node during exitText.

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:356
> +    if (!prefix.isNull()) {
> +	   if (m_currentNodeItem.hasNamespaceURI(prefix))

These can be merged onto one line.  Why isNull versus isEmpty?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:366
> +    if (parent) {

Prefer early exit.