[Webkit-unassigned] [Bug 110230] [harfbuzz] Always pass correct text direction to HarfBuzz

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Feb 19 14:17:07 PST 2013


https://bugs.webkit.org/show_bug.cgi?id=110230





--- Comment #2 from Behdad Esfahbod <behdad at google.com>  2013-02-19 14:19:30 PST ---
(In reply to comment #1)
> (In reply to comment #0)
> > FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.
> 
> How are you defining "correct"?

Correct is whatever the Unicode Bidirectional Algorithm says the piece of text should take.  UBA is run before shaping happens.


> Do you have a counter example showing the passing of an "incorrect" direction, script, or language?

Yes.  Normally, Arabic runs right-to-left.  But you can force it to go left-to-right using special Unicode characters (aka LRO) or the <bdo> tag.  When Arabic runs left-to-right, it "shapes" to different glyphs than when it goes right-to-left, because the shaping is dependent on what actually comes to the left and right of each character.  If you measure the text without telling HarfBuzz it's left-to-right, it will assume that it's right-to-left, because that's the default direction for Arabic.  And you get wrong results.

Try selection this piece of text:

data:text/html;charset=utf-8,<html><body style="font-size: 700px"><bdo dir=ltr>%D8%B3%D9%84%D9%85</body>

The desired behavior is that it should behave the same as this:

data:text/html;charset=utf-8,<html><body style="font-size: 700px">%D9%85%D9%84%D8%B3</body>

The second test has the Arabic characters reversed, and running right-to-left.  The first one has them forced left-to-right.


> Since script is not specified by the author and the HTML5 et al specs do not formally define an algorithm for mapping some sequence of text to script, then how are you defining "correct" in this regard?

Right.  Unicode defines Script per character.  All text rendering implementations have heuristics to assign script to characters of type Script=Common and Script=Inherited.  They take their property from surrounding characters.  For example, a U+002E FULL STOP character assumes the Script=Arabic property when used in Arabic text.


> How are you dealing with cases where some font formats define multiple values for "script" tags based on different versions of the font technology? For example, see [1] (script tag 'dev2' with post-2005 specifications) versus [2][3] (script tag 'deva' with pre-2005 implementations):

HarfBuzz knows about those.  You can ignore it.  What we're interested is the Unicode script assigned to the piece of text  This, again, can be guess by HarfBuzz, except for the case that the whole piece of text has Script=Common or Script=Inherited.  This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list