[Webkit-unassigned] [Bug 61561] add support for "on demand" webfonts

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Aug 8 13:43:26 PDT 2011


--- Comment #10 from Brian Stell <bstell at google.com>  2011-08-08 13:43:25 PST ---
It has been a while (it has taken me some time to get meaningful mapreduces).

> So rather than inventing a new way for client/server to communicate, 
> I'd try to latch onto existing concepts. We know a font can be broken 
> up into multiple files server-side and you can serve up pieces based 
> off unicode ranges.

For many scripts using Unicode ranges is likely to give a good win. For example, Droid Sans has ~ 2500 characters but less than 100 are needed for US English docs, less than 256 for most European docs, less than 100 for Hebrew. Splitting by Unicode range would reduce the webfont size significantly and in general most web pages using one of they scripts know which script they are using.

Can I get some help prototyping code to have WebKit request this? 

> If unicode range does a poor job of addressing specific sets of 
> characters that might be very scattered, then maybe what's needed is 
> a set of keywords for unicode range that could represent those sets 
> (and avoid the author having to define some giant list of single 
> characters in the unicode range).

While some script will benefit from Unicode ranges CJK is a very different matter. The Unicode Han unification scatters the interesting 7K Chinese (combined Simplified/Traditional) and 4K Japanese over 20K code positions in a pattern that has nothing to do with popularity (KangXi radical-stroke ordering).

We've done mapreduces to figure out the popular characters in CJK pages on the web. It takes around 4K character to cover 75% of Japanese web pages. It takes about 4K to cover 75% of Korean characters. It appears that Chinese may be in the same range but we need to redo the mapreduces separating Simplified and Traditional Chinese (Over the 75% popularity range there is uncertainty since the results include some questionable characters. Hence we are redoing the mapreducs.). A subset of 4K chars is about 25% of a CJK font so splitting the font based on popularity to hit 75% of webpages is a modest win since 25% of all CJK webpages would require additional subsets.

The mapreduce results for per-document subsetting show that 90% of CJK docs only need 600 characters. Relative to the 20K characters in a font this is a big win.

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list