[webkit-reviews] review denied: [Bug 7168] Support reading of MHTML (multipart/related) web archives : [Attachment 92676] Patch

Sat May 7 00:18:04 PDT 2011

Adam Barth <abarth at webkit.org> has denied Jay Civelli <jcivelli at chromium.org>'s
request for review:
Bug 7168: Support reading of MHTML (multipart/related) web archives
https://bugs.webkit.org/show_bug.cgi?id=7168

Attachment 92676: Patch
https://bugs.webkit.org/attachment.cgi?id=92676&action=review

------- Additional Comments from Adam Barth <abarth at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=92676&action=review

Generally, this looks like a nice approach.  I'm glad that you were able to
leverage so much of the existing WebArchive code.  The comments below are all
just details.  My main area of concern is the Content-Type parser because
that's complex and tricky code.

> LayoutTests/mhtml/page_with_image_ie-expected.txt:6
> +    RenderBody {BODY} at (8,8) size 784x584
> +	 RenderText {#text} at (0,0) size 126x18

Do these have to be render-tree tests?	It would be much better if they were
text tests...

> Source/WebCore/loader/DocumentLoader.cpp:623
> +	   ArchiveResource* resource = archiveResourceForURL(originalURL);
> +	   if (resource) {

We commonly combine these two lines.

> Source/WebCore/loader/DocumentLoader.cpp:639
> +	     return false;

bad indent

> Source/WebCore/loader/DocumentLoader.cpp:643
> +    case Archive::MHTML:
> +	   return true; // Always fail the load for resources not included in
the MHTML.

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/FrameLoader.cpp:1018
>  void FrameLoader::loadArchive(PassRefPtr<Archive> prpArchive)
>  {
> -    RefPtr<Archive> archive = prpArchive;
> +    m_archive = prpArchive;

In this case, you can rename prpArchive to archive.

> Source/WebCore/loader/FrameLoader.cpp:2323
> +    const UChar* mimeType = loader->responseMIMEType().characters();
> +    mimeType++;

I don't understand this code.

> Source/WebCore/loader/FrameLoader.h:504
> +    // 0 if the frame content is not from an archive.

I'd remove this comment.

> Source/WebCore/loader/archive/ArchiveFactory.cpp:68
> +    mimeTypes.set("multipart/related", archiveFactoryCreate<MHTMLArchive>);
> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:92
> +	   return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:49
> +    // For security reasons we only load MHTML pages from the local file
system.
> +    if (!url.isLocalFile())
> +	   return false;

Interesting.  Is that true for the other archive formats?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:56
> +    // Since MHTML is a flat format, we need to make all franes aware of all
resources.

I don't fully understand what this is doing, but it seems fine.

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:57
> +    for (size_t i = 0; i < parser.frameCount(); i++) {

++i

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:66
> +    return mainArchive;

mainArchive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:57
> +	   return 0;

return nullptr;

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:60
> +    RefPtr<MHTMLArchive> archive = adoptRef(new MHTMLArchive());

MHTMLArchive should have a static create method.  (All RefCounted types in
WebKey should be constructed via a static create method.)

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:74
> +	       // Ignore IE nesting which makes little sense.

Can you elaborate a bit in this comment?  Does that mean we can't handle all
MHTML files or that the nesting isn't required for parsing them?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:101
> +    return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:112
> +	       RefPtr<MHTMLArchive> subframe = adoptRef(new MHTMLArchive());

MHTMLArchive::create

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:117
> +    } else
> +	   m_resources.append(resource);

I'd move this to the top with an early return.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:120
> +PassRefPtr<ArchiveResource> MHTMLParser::parseNextPart(MIMEHeader*
mimeHeader, const String& endOfPartBoundary, const String&
endOfDocumentBoundary, bool* endOfArchiveReached)

endOfArchiveReached => WebKit uses non-const references for out parameters. 
Nutty, I know.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:122
> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());

Should we ASSERT something about endOfArchiveReached here?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:126
> +    bool partEndReached = false;

endOfPartReached for consistency with endOfArchiveReached ?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:166
> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString,
mimeHeader->contentLocation()),

mimeHeader->contentLocation() is always an absolute URL?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:178
> +PassRefPtr<MHTMLArchive> MHTMLParser::frameAt(size_t index) const
> +{
> +    return m_frames[index];
> +}

This function does not appear to transfer ownership.  It should just return an
MHTMLArchive*

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:188
> +PassRefPtr<ArchiveResource> MHTMLParser::subResourceAt(size_t index) const
> +{
> +    return m_resources[index];
> +}

This function does not appear to transfer ownership.  It should just return an
ArchiveResource*

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:45
> +static void unquoteString(String* string)

This would probably be better if it just returned a String.  Strings in WebKit
are immutable.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:70
> +		   keyValuePairs->add(key, value.toString().stripWhiteSpace());

What if there are duplicates?  Do we keep the first or the last?  Maybe we
error out?

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:91
> +    //    if (buffer->isEOF())
> +    //	 return 0;

Please remove commented out code.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:93
> +    RefPtr<MIMEHeader> mimeHeader = adoptRef(new MIMEHeader());

MIMEHeader::create

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:94
> +    WTF::HashMap<String, String> keyValuePairs;

Consider creating a typedef for WTF::HashMap<String, String>

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:95
> +    retrieveKeyValuePairs(buffer, &keyValuePairs);

We should just have retrieveKeyValuePairs return a HashMap rather than using
out-params.  Also, you can probably skip the WTF on the HashMap.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:117
> +    return mimeHeader.release();

yay!

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:119
> +bool MIMEHeader::isMultipart() const

Missing blank line before this line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:152
> +String MIMEHeader::contentType() const
> +{
> +    return m_contentType;
> +}
> +
> +String MIMEHeader::charset() const
> +{
> +    return m_charset;
> +}
> +
> +MIMEHeader::Encoding MIMEHeader::contentTransferEncoding() const
> +{
> +    return m_contentTransferEncoding;
> +}
> +
> +String MIMEHeader::contentLocation() const
> +{
> +    return m_contentLocation;
> +}
> +
> +String MIMEHeader::multiPartType() const
> +{
> +    return m_multipartType;
> +}
> +
> +String MIMEHeader::endOfPartBoundary() const
> +{
> +    return m_endOfPartBoundary;
> +}

All these one-liners can go in the header.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:163
> +    String encoding(text);
> +    encoding.stripWhiteSpace();
> +    encoding.makeLower();

Do these modify the string?  I would have expected:

encoding = text.lower().stripWhiteSpace();

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:174
> +void MIMEHeader::parseContentType(const String& contentType)

Don't we already have code for parsing ContentTypes?  Wow, this function is
complicated.  I haven't review it in detail.  I'll take a look during the next
pass.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:38
> +// This class is a limited MIME parser used to parse the MIME headers of
MHTML files.

I'd make this a FIXME comment.	Ideally we'd generalize this code to work for
more than just MHTML.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
> +    enum Encoding { QuotedPrintable, Base64, SevenBit, Unknown };

These should each get their own line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:49
> +    // Parses the MIME header from the passed buffer. The buffer is advanced
to
> +    // the end of the header. Returns 0 if there was an error.

This comment can be removed.

> Tools/Scripts/webkitpy/layout_tests/port/test_files.py:47
>  _supported_file_extensions = set(['.html', '.shtml', '.xml', '.xhtml',
'.xhtmlmp', '.pl',
> -				     '.php', '.svg'])
> +				     '.php', '.svg', '.mht'])

Does this need to do a similar check for the supported feature set?