[webkit-dev] HTML5 & Web Links (RFC 5988)

Thu Nov 11 11:38:39 PST 2010

On 11.11.2010 20:10, Alexey Proskuryakov wrote:
>
> 11.11.2010, в 9:19, Julian Reschke написал(а):
>
>>> As far as the Chromium request goes, please consider feature parity with Safari. We've supported non-ASCII file names in Content-Disposition for a while now, and judging by the lack of bug reports, our approach[*] is sufficient for Web compatibility. The only issue I know is with GMail, which blocks Safari server-side, replacing non-ASCII characters with question marks.
>>
>> Do you have information on how frequently it's used?
>
> Raw bytes seem to be the most common representation for non-ASCII file names on the Web. Implementing that fixed all bug reports I had about Web compatibility in that respect (except for GMail, of course), and didn't cause new ones. Some examples were Yahoo! Mail, several file sharing services, and several Korean forums.
>
> This is not surprising, as that's the only way to make a download link that works in both IE and Firefox (at least for target audiences, see below).
>
>> Judging from
>>
>> <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-03.html#rfc.section.C.4>
>>
>> it's not supported in IE, Opera, and Konqueror, so it's definitively not interoperable today (besides, it conflicts with the existing definitions for the header).
>
> The "Encoding Sniffing" column looks somewhat misleading to me, because browsers interpret raw bytes differently. I don't know if any browser "sniffs" encoding in the common sense of the word. But both IE and Firefox support raw bytes in Content-Disposition, although in different ways.

Oh, I called it "sniffing" because according to HTTP/1.1 it's 
ISO-8859-1, and some browsers "sniff" for different encodings.

> IE: Uses "Language for non-Unicode programs" setting. So with the system language set to Russian, Content-Disposition is interpreted as windows-1251 (Cyrillic). I'm not sure what it does if decoding fails.
> Firefox: Tries UTF-8, then referring document's encoding, then Latin-1.
> Safari: Tries UTF-8, then referring document's encoding, then browser default encoding, and then Latin-1, which can never fail.

Thanks for the details, my data in 
<http://greenbytes.de/tech/tc2231/#attwithisofnplain> and 
<http://greenbytes.de/tech/tc2231/#attwithutf8fnplain> was based on 
blackbox testing. The observable effect, from testing in a Western 
European locale, is that the UAs do not interoperate; some stick to 
8859-1 (Konq, Opera, IE), some "sniff" (Safari, Chrome, FF).

> The IE's mechanism is obviously the weakest - it only works if the file name encoding happens to match local user default. But that's almost always the case for end users. Anyway, if a certain Content-Disposition with raw bytes works in IE _or_ Firefox, it's almost certain to work in Safari, too. If the link works in both, it's pretty certain to work in Safari.

Indeed; and I wasn't even aware of that because I'm testing with the 
local I'm in.

I don't think the IETF will ever approve a standard where the encoding 
depends on the recipient's locale, with no reliable way to find out 
upfront what that locale is.

>>> Having two sources of file name information in HTTP headers sounds like a very weird idea to me.
>>> ...
>>
>> It's the format that has been an IETF standard for a VERY long time.
>>
>> If you have concerns with this format then you *really* should raise them in the IETF HTTPbis WG, which is revising the spec for Content-Disposition, and plans to submit it for publication soon (it's already past IETF Working Group Last Call).
>
> Perhaps I misunderstood your comment or was unclear myself - I don't have a strong opinion about RFC2231-style encoding. It seems cleaner than raw bytes, but with de facto standard being raw bytes, it also seems superfluous.

I disagree that "raw bytes" are a de facto standard; they do not 
interoperate across UAs (see above)...

> I would welcome it if the standard described what to do with raw bytes, because that's the practical case both browser and server developers need to work with. Obviously, I think that Safari solution is best for browsers (with possible addition of RFC2231/5988 support).

The spec (RFC 2616) already says that raw bytes are ISO-8859-1, so UAs 
overriding this are in violation of the spec (IMHO).

Introducing a separate parameter (filename*) that doesn't carry the 
legacy problems is in my opinion the best way to move forward.

> It would seem very weird and unfortunate to me if file names were looked up in both Content-Disposition and Link header fields. This is what I referred to as "two sources".

Ah, so that was a misunderstanding.

I was referring to the fact that "Link:" uses the same *encoding* (RFC 
5987) for the "title" parameter (not "filename"). So if a UA was to 
process Link headers for, for instance, chapter titles, it could parse 
"title*" to discover I18Nized chapter titles.

So no overlap with C-D, except that maybe the library for decoding 
RFC5987-encoded parameters could be re-used.

Best regards, Julian