[webkit-reviews] review denied: [Bug 7168] Support reading of MHTML (multipart/related) web archives : [Attachment 23119] MHTML support for WebKit GTK

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Sep 4 09:26:26 PDT 2008

David Kilzer (ddkilzer) <ddkilzer at webkit.org> has denied Marco Barisione
<marco.barisione at collabora.co.uk>'s request for review:
Bug 7168: Support reading of MHTML (multipart/related) web archives

Attachment 23119: MHTML support for WebKit GTK

------- Additional Comments from David Kilzer (ddkilzer) <ddkilzer at webkit.org>
Great work!!  I was secretly hoping you included a MIME parser and code to
write out MIME-encoded files so all platforms could benefit from this feature,
but it's nice that you used GMime for the GTK port and provided a way for other
platforms to provide their own MIME parsing/creating solutions!  :)

This is a great first patch, but there are some things that need to be fixed
before it lands. I'm going to mark this r-; see details below.

- Missing a ChangeLog entry.  See <http://webkit.org/coding/contributing.html>.

>@@ -543,8 +543,15 @@ PassRefPtr<ArchiveResource>
DocumentLoader::subresource(const KURL& url) const
>	  return archiveResourceForURL(url);
>     CachedResource* resource = doc->docLoader()->cachedResource(url);
>+    // FIXME not checking resource->preloadResult() is just a hack to
>+    // make it work. I have no clue on what is going on here.
>+    if (!resource)
>+	  return 0;
>     if (!resource || resource->preloadResult() ==
>	  return archiveResourceForURL(url);
>     return ArchiveResource::create(resource->data(), url,
> }

This definitely needs to be fixed before the patch lands.  Please talk to
'antti' in the #webkit channel on irc.freenode.net about preloading and how it
affects this patch, or (if he's not available), beidson/bradee-oh.

>+	  else
>+	      // MHTML archives do not store the frame name, so we use the
>+	      // URL as the name
>+	      m_subframes.set((*iFrame)->mainResource()->url().string(),

I think you can leave off ".string()" after "url()" here as there is a method
to turn a KURL object into a String automatically.

>+PassRefPtr<Archive> ArchiveResourceCollection::popSubframeArchive(const
String& frameName, const KURL& url)
> {
>-    return m_subframes.take(frameName);
>+    PassRefPtr<Archive> archive = m_subframes.take(frameName);

You want a RefPtr<Archive> here, not a PassRefPtr.

>+    if (archive)
>+	  return archive;
>+    return m_subframes.take(url.string());

Again, I think ".string()" may be left off here.

>+bool MHTMLWebArchive::init(SharedBuffer* data)
>+    ASSERT(data);
>+    if (!data)
>+	  return false;
>+    if (!parseRawDataRepresentation(data))
>+	  return false;
>+    ASSERT(!m_mainResourceType.isEmpty());
>+    /* MHTML files do not have the concept of subframes, so resources are
>+     * and are assigned to the main frame. To make the resources accessible
>+     * also to subframes we add all the subresources to all the subframes. */

This is probably fine to land this way, but please consider talking to
beidson/bradee-oh in #webkit on irc.freenode.net about how to improve this data

For example, would it make more sense to put all resources as main resources
(which more closely matches the structure of MHTML files), but have all
subframes look for their resources in the main resources if the archive is an
MHTML archive?

>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    unsigned i;
>+    for (i = 0; i < subframes.size(); ++i) {

Please declare the loop variable inside the for loop, e.g., "for (unsigned i =
0; ...".

>+	  const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+	  unsigned j;
>+	  for (j = 0; j < resources.size(); ++j)

Please declare the loop variable inside the loop.

>+	  /* For some reasons IE sometimes saves sub frames using the
"application/octet-stream" mime type instead of
>+	   * text/html. In all the cases we tested these HTML files always
start with a UTF-8 BOM followed by a
>+	   * "<!DOCTYPE ...", so we use it as an heuristic to fix the mime
type. */
>+	  const char* htmlHead = "\xef\xbb\xbf<!DOCTYPE ";
>+	  CString octetHead(buffer->data(), strlen(htmlHead));
>+	  if (!strcmp(octetHead.data(), htmlHead))
>+	      response.setMimeType("text/html");
>+    }

Is a better way to check for the UTF-8 BOM plus "<!DOCTYPE " here rather than
hard-coding the UTF-8 BOM sequence?  Probably okay to land as is, but it would
be nice if we could get the UTF-8 BOM from a predefined macro or const (perhaps
in ICU?).

>+    // MHTML files do not store subframe names, so we use the URL as the
>+    PassRefPtr<ArchiveResource> resource = ArchiveResource::create(buffer,
response.url(), response.mimeType(), response.textEncodingName(),
response.url().string(), response);

You want a RefPtr here, not a PassRefPtr.  PassRefPtr types are only used in
method argument declarations.  See this unlinked web page for more details: 

Also, I think "response.url().string()" can just be "response.url()" here.

>+	  PassRefPtr<MHTMLWebArchive> archive = create();

Again, RefPtr instead of PassRefPtr.

>+void MHTMLWebArchive::serializeResourceIfNew(ArchiveResource* resource,
HashSet<String>& uniqueSubresources)
>+    /* MHTML files do not have the concept of subframes, so resources are
>+     * To make them accessible also from subframes they have been also
assigned to all
>+     * the subframes (see MHTMLWebArchive::init). When saving a MHTML we want
to avoid
>+     * to save the duplicated resources, so we use uniqueSubresources to keep
track of
>+     * what was already included. */

If you switched to saving the resources only on the main resources, then I
think you would get the uniqueneess for free, right?  Or does it use an array
instead of a hash to store the resources?  (I thought it used a hash.)

>+    const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+    unsigned i;
>+    for (i = 0; i < resources.size(); ++i)

Please declare the loop variable inside the for loop.

>+	  rootArchive->serializeResourceIfNew(resources[i].get(),
>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    for (i = 0; i < subframes.size(); ++i)

Please redeclare 'i' inside the loop variable here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = serializeEnd();

Use RefPtr.

>+    return sharedBuffer;

This becomes "return sharedBuffer.release();" with the change to RefPtr above.

>+    RefPtr<MHTMLWebArchive> archive = create();
>+    archive->setMainResource(mainResource);
>+    for (unsigned i = 0; i < subresources.size(); ++i)
>+	  archive->addSubresource(subresources[i]);
>+    for (unsigned i = 0; i < subframeArchives.size(); ++i)
>+	  archive->addSubframeArchive(subframeArchives[i]);  

This would change if all resources were stored as main resources.

>+    // Without this hack a MHTML stream is saved inside another MHTML stream.

>+    // FIXME Figure out how can LegacyWebArchive work without a similar hack.

>+    if (documentLoader->mainResource()->mimeType() == "message/rfc822")
>+	  return

Because MHTML files don't have the concept of "subFrames" or
"subFrameArchives", I don't think you need to include the if statement check. 
Just return here unconditionally and remove the rest of the code in the method.
 The if statement is always going to be true anyway, correct?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(Range* range)

This method looks very similar to the LegacyWebArchive.cpp method with the same
signature.  Is there any benefit to creating a super class for both
LegacyWebArchive and MHTMLWebArchive to share code?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(const String&
markupString, Frame* frame, Vector<Node*>& nodes)
> [...]
MHTMLWebArchive::createFromSelection(Frame* frame)

These two methods look identical, too.	I'd suggest making a super class to
hold common code rather than duplicating it here.

These methods may also not be needed since I think they're used for copying
documents (or document fragments) to the pasteboard on Mac OS X, which just
happens to use the same format as *.webarchive files.

>+typedef struct _GMimeObject GMimeObject;
>+typedef struct _GMimePart GMimePart;
>+typedef struct _GMimeMultipart GMimeMultipart;
>+namespace WebCore {
>+void handleMimeObject(GMimeObject*, void*);
>+namespace WebCore {
>+class Frame;
>+class Node;
>+class Range;
>+class MHTMLWebArchive : public Archive {

I would rather see another #if PLATFORM(GTK)/#endif rather than two "namespace
WebCore { }" statements here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = SharedBuffer::create(buffer,

Use RefPtr.

>+    addMimeResource(sharedBuffer, response);

"sharedBuffer" becomes "sharedBuffer.release()".

>+    PassRefPtr<SharedBuffer> sharedBuffer =
SharedBuffer::create(byteArray->data, byteArray->len);

Use RefPtr.

>+    return sharedBuffer;

Becomes "sharedBuffer.release()".

>+    PassRefPtr<MHTMLWebArchive> archive =
>+    if (!archive)
>+	  return false;
>+    PassRefPtr<SharedBuffer> buffer = archive->rawDataRepresentation();
>+    return g_file_set_contents(filename, buffer->data(), buffer->size(), 0);

Use RefPtr.  No other changes should be necessary.

As I said above, r- to fix the issues above, but a great first patch!!

More information about the webkit-reviews mailing list