[webkit-dev] Webkit compatibility in India - Transcoding Indic fonts

Thu Nov 20 03:00:48 PST 2008

On Thu, Nov 20, 2008 at 12:01 AM, Jungshik Shin (신정식, 申政湜) <
jungshik at google.com> wrote:

>
>
> 2008/11/6 Prunthaban Kanthakumar <prunthaban at google.com>
>
> Hi All,
>>
>> This is a continuation of the mail thread
>> https://lists.webkit.org/pipermail/webkit-dev/2008-October/005495.html
>>
>> I am interested in discussing about some of the ways to implement mjs'
>> ideas.
>>
>> As mjs says in the above mail,
>>
>> *In case you look into implementing this, what I'd suggest is an extra
>> CSS property that can be set based the font property at style resolution
>> time. (since I think the computed font list will strip EOT fonts, so it
>> might be too late to look at it once you are on the rendering side).
>> Something like -webkit-indic-text-decode. *
>>
>> When the code reaches RenderText::styleDidChange method, the font
>> information will still remain in the RenderStyle object associated with the
>> RenderText (because this happens at the time of parsing the html file, well
>> before font resolution happens).  Now in this method, there is check to see
>> if there are text-transformations as part of the style and if there is one,
>> then the method setText is called, forcing it to modify the 'internal text'
>> if needed.
>>
>> Now we can do the following,
>> 1. Add an additional condition in styleDidChange method to check if the
>> font-family is supported by our transcoder (At present a fast look-up table
>> should do because we plan to support only limited set of fonts)  - This
>> condition will be #ifdefed on ENABLE(TRANSCODER_SUPPORT).
>> 2. Now in the setTextInternal method, based on the font-family, we get the
>> corresponding transcoder (probably from a map) and perform the transcoding.
>>
>> Later when font-resolution happens, since the particular font is eot, it
>> will be ignored and based on the code point of glyphs a default font will be
>> choosen by Webkit and hence the correct characters will appear on the
>> screen.
>> Also after setTextInternal method there is a layout & width recalculation
>> done which is important for us because we modify the characters. So
>> RenderText::setTextInternal method seems to be the ideal place to plug-in
>> the transcoder.
>>
>> On a related note, I would like to mention here that, we cannot go with
>> the approach of 'one look-up table' per font-face and a single transcoder to
>> do the look-up for all fonts. The problem is that many indic languages use
>> multiple code-points to represent one character and different fonts use
>> different standards! For example there are situations where one glyph in EOT
>> needs to be transcoded to 5+ Unicode code points. A reverse situation is
>> also possible. Due to these issues, we cannot go with a simple look-up table
>> for all fonts. This forces us to write some specialized code to handle each
>> font (there might also be some fonts where a one-to-one look-up table will
>> be enough).
>
>
>
> In October, I listed two alternatives for this transformation. One is
> adding ICU converters for Indic font encodings (it can deal with m-to-n
> mappings) and the other is implementing your own. The first was ruled out
> because it's not easy to add new converters on Mac OS X where ICU is a part
> of the OS.   There's another approach you can take. You can build ICU
> transliterator rules and it seems to be the cleanest way to do this. You
> don't need to port/implement conversion code (from another project : e.g.
> Padma) but just need to 'port' the conversion tables to ICU transliterator
> rules.
>
> This transcoding will be invoked on the content of a text node already in
> Unicode just like 'text-transform: capitalize' or 'text-transform:
> lowercase' is.  ICU transformer is for transforming a chunk of text in
> Unicode to another chunk of text in Unicode.
> ( http://www.icu-project.org/userguide/Transform.html ) So, it appears to
> be almost a perfect fit.
>

I do not have much knowledge about ICU Transformers. But from the link above
what I understand is, transformers are to perform 'transliteration' like
converting from English to Hindi. I am not sure how this can be used to
transcode indic fonts. (ICU Converters are the ones which do transcoding
from one script to another. But from what you have said, it looks like ICU
converters are not the way to go).

Also what we are trying to do is to transcode characters which are actually
in the ASCII range (whose glyphs are "hacked" by font designers to render
indic characters) to unicode characters of the corresponding language. So to
what extent a transfomer is going to be helpful to us? In our case each font
(or in some cases a set of fonts due to some standardization efforts in the
past) will have its own mapping of ASCII-Unicode (which are m-to-n) and the
purpose of ICU transformers seem to be different from this.

>
>
> Jungshik
>
> P.S. BTW, I filed https://bugs.webkit.org/show_bug.cgi?id=22339 for this
> task.
> If you haven't filed one, why don't you use 22339 for uploading a prototype
> patch for one (site, font) pair as Brett suggested?
>

Thanks. I will use that. Once we decide upon the approach, I will go ahead
with implementing it and submit a patch in the bug id you created.

>
>
>
>
>
>>
>>
>> I would like to hear from you about this. Is this approach fine or do you
>> have any issues or suggestions?
>>
>> Regards,
>> Prunthaban
>>
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20081120/c1c48610/attachment.html>