[webkit-dev] Webkit compatibility in India - Transcoding Indic fonts

Wed Nov 19 17:26:23 PST 2008

On Nov 19, 2008, at 10:42 AM, Jungshik Shin (신정식, 申政湜)  
wrote:

>
>
> 2008/11/6 Prunthaban Kanthakumar <prunthaban at google.com>
>
> Now we can do the following,
> 1. Add an additional condition in styleDidChange method to check if  
> the font-family is supported by our transcoder (At present a fast  
> look-up table should do because we plan to support only limited set  
> of fonts)  - This condition will be #ifdefed on  
> ENABLE(TRANSCODER_SUPPORT).
>
> Shouldn't this be triggered by (font-family, site) rather than just  
> font-family?

Since we're looking at this as a legacy compatibility feature, and  
would like future sites to move to proper Unicode-encoded text, my  
first instinct would be {font, site} pairs. But that depends on  
whether we can achieve acceptable Indic browsing results with just a  
fixed list of sites.

>
> On a related note, I would like to mention here that, we cannot go  
> with the approach of 'one look-up table' per font-face and a single  
> transcoder to do the look-up for all fonts. The problem is that many  
> indic languages use multiple code-points to represent one character  
> and different fonts use different standards! For example there are  
> situations where one glyph in EOT needs to be transcoded to 5+  
> Unicode code points. A reverse situation is also possible. Due to  
> these issues, we cannot go with a simple look-up table for all  
> fonts. This forces us to write some specialized code to handle each  
> font (there might also be some fonts where a one-to-one look-up  
> table will be enough).
>
> In October, I listed two alternatives for this transformation. One  
> is adding ICU converters for Indic font encodings (it can deal with  
> m-to-n mappings) and the other is implementing your own. The first  
> was ruled out because it's not easy to add new converters on Mac OS  
> X where ICU is a part of the OS.   There's another approach you can  
> take. You can build ICU transliterator rules and it seems to be the  
> cleanest way to do this. You don't need to port/implement conversion  
> code (from another project : e.g. Padma) but just need to 'port' the  
> conversion tables to ICU transliterator rules.
>
> This transcoding will be invoked on the content of a text node  
> already in Unicode just like 'text-transform: capitalize' or 'text- 
> transform: lowercase' is.  ICU transformer is for transforming a  
> chunk of text in Unicode to another chunk of text in Unicode.
> ( http://www.icu-project.org/userguide/Transform.html ) So, it  
> appears to be almost a perfect fit.

This sounds like it would work for any ICU-based, though it would  
prevent the feature from working for ports that use something other  
than ICU for unicode and text transcoding support, most notably the Qt  
port. Would it simplify the code significantly to make it an ICU  
transformer rather than something custom?

Regards,
Maciej

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20081119/f8806f8c/attachment.html>