[webkit-dev] HTML5 & MathML3 entities

Sausset François sausset at gmail.com
Sat Jul 10 11:50:36 PDT 2010


I'm not sure to understand everything, but the given link doesn't deal with the case where an entity should be translated to 2 Unicode characters, instead of only one as it is the case with the current hash table system.

Such 2 characters entities don't exist in the HTML 5 entity list, but some are present in the one used by MathML 3 (link in my previous message).

François Sausset


Le 10 juil. 2010 à 21:17, Adam Barth a écrit :

> On Sat, Jul 10, 2010 at 11:10 AM, Sausset François <sausset at gmail.com> wrote:
>> I just saw that when looking at the code by myself.
>> What do you exactly mean by a prefix tree?
> 
> http://en.wikipedia.org/wiki/Trie
> 
>> I also noticed that the entity parser does not take into account combined
>> Unicode characters (see §A.3 in: http://www.w3.org/TR/xml-entity-names/).
>> In addition, even without entities, combined characters are displayed as
>> separate ones.
> 
> My understanding is that is the correct behavior w.r.t. the HTML5
> specification of entity parsing.  Our entity processing aims for
> perfect compliance with this algorithm:
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references
> 
> My belief is the only things we're missing for perfect compliance is
> the expanded list of entity names:
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references
> 
> and the prefix tree.
> 
> Adam



More information about the webkit-dev mailing list