[webkit-dev] HTML5 & Web Links (RFC 5988)

Alexey Proskuryakov ap at webkit.org
Thu Nov 11 13:14:10 PST 2010

11.11.2010, в 11:38, Julian Reschke написал(а):

> I don't think the IETF will ever approve a standard where the encoding depends on the recipient's locale, with no reliable way to find out upfront what that locale is.

Yes, that makes good sense to me.

Note that Safari's doesn't rely on OS locale (other than for picking its original default browser encoding, which can then be changed by the user). Surely, some people are allergic to the idea of default browser encoding too, but it's unavoidable in practice - we can't interpret untagged content as Latin-1.

> I disagree that "raw bytes" are a de facto standard; they do not interoperate across UAs (see above)...

I think that we agree about technical details and empirical data now, but describe them differently.

Surely, there is no way (that I'm aware of) to guarantee correct downloaded file name in all browsers for all users. A lot of server operators only care about users in their country, and can reasonably (i.e. with negligible cost to business) rely on Windows locale being set. They can just send raw bytes in language default encoding in Content-Disposition, and that works for them and their clients. For all I know, that's what almost everyone does, and it's "interoperable" for them.

Global operators like Google or Yahoo obviously want to cover many languages at once, and they just send different HTTP headers to different browsers. That's not great, but that's unavoidable unless IE changes - whether changing interpretation of raw bytes or implementing RFC5987, IE would have to change.

> The spec (RFC 2616) already says that raw bytes are ISO-8859-1, so UAs overriding this are in violation of the spec (IMHO).

Yes, that's why I'd really welcome a spec that's closer to reality in this regard. No browser whose vendor cares about markets not covered by Latin-1 can actually treat raw bytes in Content-Disposition as ISO-8859-1. No server operator who wants to serve downloadable content in those markets can stick to ISO-8859-1.

> Introducing a separate parameter (filename*) that doesn't carry the legacy problems is in my opinion the best way to move forward.

As a browser implementor, I don't have a strong opinion about filename*. The actual content I see on the Web uses raw bytes in Content-Disposition, so I mostly care about that being adequately specified, so that at least non-IE browsers could all work the same. Firefox and Safari are already pretty close. It's unfortunate if Chrome does not implement this fallback scheme.

Generally speaking, having no custom encoding is better than having an opaque custom encoding. In my opinion, the ideal situation would be for servers to send raw UTF-8, and for clients to do what Safari and Firefox do (try UTF-8, then fall back to other encodings). This may be unachievable in practice, in which case interoperability via opaque RFC2231-style encoding is a lesser evil.

- WBR, Alexey Proskuryakov

More information about the webkit-dev mailing list