[webkit-dev] Webkit compatibility in India - Transcoding Indic fonts

Wed Nov 19 17:38:01 PST 2008

Am Wed, 19 Nov 2008 17:26:23 -0800
schrieb Maciej Stachowiak <mjs at apple.com>:

> 
> On Nov 19, 2008, at 10:42 AM, Jungshik Shin (신정식, 申政湜)  
> wrote:
> 
> >
> >
> > 2008/11/6 Prunthaban Kanthakumar <prunthaban at google.com>
> >
> > Now we can do the following,
> > 1. Add an additional condition in styleDidChange method to check
> > if the font-family is supported by our transcoder (At present a
> > fast look-up table should do because we plan to support only
> > limited set of fonts)  - This condition will be #ifdefed on  
> > ENABLE(TRANSCODER_SUPPORT).
> >
> > Shouldn't this be triggered by (font-family, site) rather than
> > just font-family?
> 
> Since we're looking at this as a legacy compatibility feature, and  
> would like future sites to move to proper Unicode-encoded text, my  
> first instinct would be {font, site} pairs. But that depends on  
> whether we can achieve acceptable Indic browsing results with just a  
> fixed list of sites.
> 
> >
> > On a related note, I would like to mention here that, we cannot go  
> > with the approach of 'one look-up table' per font-face and a
> > single transcoder to do the look-up for all fonts. The problem is
> > that many indic languages use multiple code-points to represent one
> > character and different fonts use different standards! For example
> > there are situations where one glyph in EOT needs to be transcoded
> > to 5+ Unicode code points. A reverse situation is also possible.
> > Due to these issues, we cannot go with a simple look-up table for
> > all fonts. This forces us to write some specialized code to handle
> > each font (there might also be some fonts where a one-to-one
> > look-up table will be enough).
> >
> > In October, I listed two alternatives for this transformation. One  
> > is adding ICU converters for Indic font encodings (it can deal
> > with m-to-n mappings) and the other is implementing your own. The
> > first was ruled out because it's not easy to add new converters on
> > Mac OS X where ICU is a part of the OS.   There's another approach
> > you can take. You can build ICU transliterator rules and it seems
> > to be the cleanest way to do this. You don't need to port/implement
> > conversion code (from another project : e.g. Padma) but just need
> > to 'port' the conversion tables to ICU transliterator rules.
> >
> > This transcoding will be invoked on the content of a text node  
> > already in Unicode just like 'text-transform: capitalize' or 'text- 
> > transform: lowercase' is.  ICU transformer is for transforming a  
> > chunk of text in Unicode to another chunk of text in Unicode.
> > ( http://www.icu-project.org/userguide/Transform.html ) So, it  
> > appears to be almost a perfect fit.
> 
> This sounds like it would work for any ICU-based, though it would  
> prevent the feature from working for ports that use something other  
> than ICU for unicode and text transcoding support, most notably the
> Qt port. Would it simplify the code significantly to make it an ICU  
> transformer rather than something custom?

Note that the Gtk port is also going to drop ICU in favour of Glib
encoding functionality, so an ICU based solution would already not
apply on two ports.

ciao,
    Christian