[webkit-dev] WebKit compatibility in India

Maciej Stachowiak mjs at apple.com
Wed Oct 22 19:25:57 PDT 2008


I think the general approach you outline makes sense. I think the best  
way to avoid transcoding content that shouldn't be is to key it off of  
both font and site, assuming that a relatively limited list of fonts  
and sites would provide enough compatibility.

That means the transcoding would have to be done late, once styles are  
resolved. This is entirely feasible; CSS text-transform already does  
late transformation of text based on styles, and it works right with  
copy/paste and everything:

<div style="text-transform: uppercase">this is a test</div>

So this should be easily doable and shouldn't hurt performance in the  
common case if we don't even do the font check on sites that are not  
in a qualifying domain.

Regards,
Maciej


On Oct 22, 2008, at 2:21 PM, Brett Wilson wrote:

> Hi everybody,
>
> There was recently somewhat of a controversy regarding Embedded
> OpenType (EOT) support in WebKit. The most important reason to support
> this technology is not for web designers who want custom fonts, but
> because some sites using legacy technology use a custom encoding with
> a custom embedded font to display their non-Latin characters. Most of
> these sites are in India or Indic languages.
>
> I am very much not an expert in this area. My goal is to start a
> discussion about "what to do about Indic compatibility" rather than
> "should EOT be supported" in WebKit. Just supporting EOT in WebKit
> would make the sites appear correctly, but it would not address some
> of the basic problems like copy and paste or Google Chrome's full text
> indexing feature.
>
> Waiting for the sites to fix themselves / evangelism (basically what
> all browsers are doing now) is an option, and has apparently had some
> success. Some sites seem to be stuck on old technology, so it may not
> be their choice to not use Unicode. Sticking with this plan may make
> WebKit adoption possible in the long term, but would not help very
> much in the short term.
>
> Google Search does some special detection to transcode sites that use
> these custom encodings. One approach would be to do the same in the
> browser. The browser would contain a list of domains with problems,
> and a character map table that maps the custom 8-bit encoding to
> Unicode (hopefully there are many fewer encodings than sites).
> Alternatively, it could key off the font name, if we find that these
> are unique enough to identify the encoding (anybody know if this is
> the case?). All incoming pages would first be checked against this
> list, and if a match was found, it would trigger the converter. I
> found a list of ~100 popular sites that require special encodings that
> we can start with.
>
> Doing this conversion has several challenges:
>
> - It could not be blindly applied to all pages on the site. Many of
> the sites have English pages which we wouldn't want to convert, and if
> the site ever fixes itself to use a standard encoding, we would want
> to be able to automatically pick that up. Some pages declare the
> charset as "x-user-defined", while some list something else (I saw
> ISO-8859-1 but there may be others). I think there would need to be
> a somewhat smart encoding detector here (like auto charset detection  
> today).
>
> - It could not be blindly applied to all content in a single page.
> Many of the pages are a combination of custom-encoded text using an
> EOT font and English (or other language) using a different font. For
> example, see http://www.futuresamachar.com/fs/hindi/index.htm
> ("Duration", "By Post", etc. on the right are coded to use "Verdana"
> to get the regular encoding and would be corrupted if a transcoder was
> applied to the entire page). This makes me wonder what integration
> with WebKit would look like, since being dependent on CSS means it
> couldn't be just applied in the normal character set conversion phase
> during parsing.
>
> Are there other approaches that WebKit-based browsers can take to
> getting better compatibility with Indic sites? What problems do people
> more familiar with this area see with the transcoding approach? Could
> it be implemented cleanly and would a whitelist ever have a hope of
> covering the sites that Indian users care about? Or should we continue
> with evangelism and wait?
>
> Brett
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



More information about the webkit-dev mailing list