[Webkit-unassigned] [Bug 89978] New: [GTK] MHTML files not being loaded due to reported mime type not supported

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Jun 26 06:56:38 PDT 2012


https://bugs.webkit.org/show_bug.cgi?id=89978

           Summary: [GTK] MHTML files not being loaded due to reported
                    mime type not supported
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: Unspecified
        OS/Version: Unspecified
            Status: NEW
          Keywords: Gtk
          Severity: Normal
          Priority: P2
         Component: WebKit Gtk
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: msanchez at igalia.com
                CC: mrobinson at webkit.org, cgarcia at igalia.com,
                    svillar at igalia.com


With bug 7168 fixed, we have now MHTML read support in WebKit, which will enable any WebKit-based browser to read MHTML files as if they were "normal" HTML files with the external resources embedded in it (normally using base64 encoding).

However, at the moment any WebKitGtk based browser willing to load a MHTML file (typically with a .mht extension) won't be able to do it because the patch for bug 7168 enables WebCore to load files of mime type multipart/related only, when MHTML is enabled at build time:

    Source/WebCore/loader/archive/ArchiveFactory.cpp
    ------------------------------------------------
    [...]
    #if ENABLE(MHTML)
        mimeTypes.set("multipart/related", archiveFactoryCreate<MHTMLArchive>);
    #endif
    [...]

This seems to be working wonderfully in other WebKit-based browsers (e.g. Chromium) but not in WebKitGtk, since the mimetype obtained for those MHTML files is not multipart/related, but message/rfc822[1] instead.

I've been digging this issue for some time today and found out that the sequence of libraries used to retrieve the mime type for a given file, when using WebKitGTK, is as follows:

  WKGTK -> libsoup -> GIO -> GVFS -> GNOME-VFS -> shared-mime-info (from freedesktop.org)

So, I've hacked for a while on shared-mime-info's XML file where the heuristics and matching rules are defined and managed to get it telling me at some point that .mht files generated with WebKit were actually of mime type multipart/related, instead of message/rfc822. Unfortunately, the very same patch in shared-mime-info caused some regressions there, making some files containing true email messages report multipart/related too, instead of message/rfc822.

For the sake of completeness, this is the an excerpt of the content of the MHTML file I was using for testing:

  From: <Saved by WebKit>
  Subject: 
  Date: Wed, 26 Jun 2012 12:38:36 +0100
  MIME-Version: 1.0
  Content-Type: multipart/related;
        type="text/html";
        boundary="----=_NextPart_000_47A9_6EDBB149.472FD3B3"

  ------=_NextPart_000_47A9_6EDBB149.472FD3B3
  Content-Type: text/html
  Content-Transfer-Encoding: quoted-printable
  Content-Location: file:///home/mario/work/gnome3/WebKitTests/mhtml.html

  <html><head><meta charset=3D"ISO-8859-1"></head><body>
      A red box: <img src=3D"data:image/png;base64,iVBORw0...ggg=3D=3D"><br>
      A blue box: <img src=3D"data:image/png;base64,iVBORw...ggg=3D=3D">
   =20
  </body></html>
  ------=_NextPart_000_47A9_6EDBB149.472FD3B3
  Content-Type: image/png
  Content-Transfer-Encoding: base64
  Content-Location:
  data:image/png;base64,iVBORw0...ggg==
  [...]


So, after reaching that point I realized that perhaps the problem was not in shared-mime-info since, after all, despite of it being a .mht file, it is also true that it is written in a perfectly valid e-mail format, so it's not strange that it's returning message/rfc822. So I now think it would be better to make some changes here in WebKit (at least in WebKitGTK) just to support reading MHTML files, hence this bug.

Furthermore, we should also consider that there are cases where a MHTML file is not encoded using multipart/related (when no there are no external resources linked, according to [2]), which seems to suggest that the 'multipart/related' rule added with the patch for bug 7168 could be insufficient, and also that making shared-mime-info return that mime tipe instead of message/rfc822 could even be wrong (since they could very well be encoded in text/plain or text/html).

So, I'll be working in a patch for fixing this issue, at least in WebKitGTK. Otherwise, the efforts done in bug 89872 and bug 89873 will be pointless, at least for GTK.

[1] http://tools.ietf.org/html/rfc822
[2] http://tools.ietf.org/html/rfc2557#section-9.1

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list