[webkit-dev] Fwd: Fwd: Fwd: HTML5 & MathML3 entities

Alexey Proskuryakov ap at webkit.org
Fri Sep 17 14:34:44 PDT 2010



Начало переадресованного сообщения:

> От: David Carlisle <davidc at nag.co.uk>
> Дата: 17 сентября 2010 г. 14:28:33 Тихоокеанское летнее время
> Кому: Alexey Proskuryakov <ap at webkit.org>
> Тема: Ответ: [webkit-dev] Fwd: Fwd: HTML5 & MathML3 entities
> 
> On 17/09/2010 21:12, Alexey Proskuryakov wrote:
>> 
>> 17.09.2010, в 12:49, Alexey Proskuryakov написал(а):
>> 
>>> math fonts in uniocde layout typically use the opposite choice for
>>> the old character, and math renderers (including if I remember
>>> correctly mozilla's original mathml support) that map unicode slots
>>> to "legacy" 8 bit math font encodings (eg the TeX or mathematica
>>> fonts) also rendered these things to match the math operators.
>> 
>> I'd say that this was a misuse of the character point,
> 
> No the code point is in the math symbols block and was always intended
> for math usage. Some time after the code point was added (I think, I
> don't have the data to hand) it got added a canonical mapping to to 3xxx
> block, that was an error that the unicode consortium is now trying to
> correct (or at least back when unicode 3.x added this new character)
> 
>> and thus a bug in the font. Unicode fonts aren't supposed to use
> glyphs with different meanings in the same code points (although some
> may claim that CJK unification crosses the border in that respect).
>> 
>>> So given, as you say, that some change was inevitable, taking the
>>> math choice is I still think the right one
>> 
>> 
>> It just seems that Unicode consortium and W3C basically decided to
>> make opposite choices on this,
> 
> No the change was explicitly at the suggestion of the UTC who wanted to
> deprecate the use of the old character and clarify that (because the
> character was always intended for math use) that any uses of the old one
> are replaced by the new one,
> 
> > which was not very helpful.&rang used to be a synonym to U+232A, but > now the former is a CJK character, and the latter is a math one. I
> > think that consistency with Unicode consortium's choice is an
> > important consideration.
> 
> I think you have "former" and "latter" swapped in the above?
> 
> &rang used to map to 232A  which was originally a math character but then erroneously given a canonical mapping to 3009
> 
> It now maps to 27E9; which was a character added by the UTC explicitly
> to be a replacement for the deprecated 232A, which is only deprecated because it had been given an erroneous NFC mapping,
> 
> As I said above the definition as given in the spec _is_ the choice of the Unicode consortium (or at least the choice of those members of the UTC who corresponded with me)
>> 
> 
>> But if I understood you correctly, there is a significant amount of
>> MathML documents that use rang/lang, is that accurate?
> 
> Oh yes angle brackets are very common notation.
> 
> > Practical compatibility requirements are more important than potential
> > issues that I cited, at least as long as they remain potential. If I
> > had an example of brokenness,
> 
> > I'd say that HTML content that followed specs outweighs MathML content
> > that violated them.
> 
> MathML did not violate any specs.
> 
> the lang and rang entity names come from the ISO math entity to denote
> math angle brackets. These sets and these names predate Unicode and predate HTML, it's unfortunate that after the names were mapped to unicode a canonical mapping to a different character was added, but the only fix the UTC suggest for that is just not using 2329 at all and use 27E8 instead. Which is what the entity spec recommends.
>> 
>> - WBR, Alexey Proskuryakov
>> 
>> 
> David

- WBR, Alexey Proskuryakov




More information about the webkit-dev mailing list