[Webkit-unassigned] [Bug 168182] Update custom line breaking iterators to the latest version of Unicode

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Feb 11 20:04:53 PST 2017


https://bugs.webkit.org/show_bug.cgi?id=168182

--- Comment #1 from Myles C. Maxfield <mmaxfield at apple.com> ---
I went through the breaking rules line by line, and compared it to ICU's 54.1 release[1] . I found a few things:

1. This version of ICU has no concept of strict vs loose line breaking. Therefore, my comparisons are done ignoring the loose/normal/strict pieces of our custom rules. These pieces of our rules work by adding / removing characters from the existing unicode sets. Newer unicode does have a concept of strict / loose rules.
1. Our emoji handling is custom, and not included in ICU.
2. We have a couple declarations hidden behind ADDITIONAL_EMOJI_SUPPORT which ICU includes.
3. There are three constructions we have which the open source rules don't have:
$EXcm $INcm;
$CM* $IN $CM* $EX;
$CM+ $RI;

We should probably just opt all ports into the ADDITIONAL_EMOJI_SUPPORT flag and delete the flag.

The first two of the different constructions have to do with characters in the inseparable class, and the second one has to do with regional indicators. I'm not sure, but my theory right now is that these are just an oversight and aren't necessary.

[1] http://source.icu-project.org/repos/icu/icu/tags/release-54-1/source/data/brkitr/line.txt

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20170212/cda5c946/attachment.html>


More information about the webkit-unassigned mailing list