[webkit-dev] HTML5 & Web Links (RFC 5988)

Alexey Proskuryakov ap at webkit.org
Thu Nov 11 11:10:58 PST 2010

11.11.2010, в 9:19, Julian Reschke написал(а):

>> As far as the Chromium request goes, please consider feature parity with Safari. We've supported non-ASCII file names in Content-Disposition for a while now, and judging by the lack of bug reports, our approach[*] is sufficient for Web compatibility. The only issue I know is with GMail, which blocks Safari server-side, replacing non-ASCII characters with question marks.
> Do you have information on how frequently it's used?

Raw bytes seem to be the most common representation for non-ASCII file names on the Web. Implementing that fixed all bug reports I had about Web compatibility in that respect (except for GMail, of course), and didn't cause new ones. Some examples were Yahoo! Mail, several file sharing services, and several Korean forums.

This is not surprising, as that's the only way to make a download link that works in both IE and Firefox (at least for target audiences, see below).

> Judging from
> <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>
> it's not supported in IE, Opera, and Konqueror, so it's definitively not interoperable today (besides, it conflicts with the existing definitions for the header).

The "Encoding Sniffing" column looks somewhat misleading to me, because browsers interpret raw bytes differently. I don't know if any browser "sniffs" encoding in the common sense of the word. But both IE and Firefox support raw bytes in Content-Disposition, although in different ways.

IE: Uses "Language for non-Unicode programs" setting. So with the system language set to Russian, Content-Disposition is interpreted as windows-1251 (Cyrillic). I'm not sure what it does if decoding fails.
Firefox: Tries UTF-8, then referring document's encoding, then Latin-1.
Safari: Tries UTF-8, then referring document's encoding, then browser default encoding, and then Latin-1, which can never fail.

The IE's mechanism is obviously the weakest - it only works if the file name encoding happens to match local user default. But that's almost always the case for end users. Anyway, if a certain Content-Disposition with raw bytes works in IE _or_ Firefox, it's almost certain to work in Safari, too. If the link works in both, it's pretty certain to work in Safari.

>> Having two sources of file name information in HTTP headers sounds like a very weird idea to me.
> > ...
> It's the format that has been an IETF standard for a VERY long time.
> If you have concerns with this format then you *really* should raise them in the IETF HTTPbis WG, which is revising the spec for Content-Disposition, and plans to submit it for publication soon (it's already past IETF Working Group Last Call).

Perhaps I misunderstood your comment or was unclear myself - I don't have a strong opinion about RFC2231-style encoding. It seems cleaner than raw bytes, but with de facto standard being raw bytes, it also seems superfluous.

I would welcome it if the standard described what to do with raw bytes, because that's the practical case both browser and server developers need to work with. Obviously, I think that Safari solution is best for browsers (with possible addition of RFC2231/5988 support).

It would seem very weird and unfortunate to me if file names were looked up in both Content-Disposition and Link header fields. This is what I referred to as "two sources".

- WBR, Alexey Proskuryakov

More information about the webkit-dev mailing list