[Webkit-unassigned] [Bug 165601] [GTK] WebKitWebProcess at 100% CPU loading hyphenation dictionaries (ASSERTION FAILED: xPos + prefixWidth <= availableWidth)

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Jan 9 01:00:18 PST 2017


Zan Dobersek <zan at falconsigh.net> changed:

           What    |Removed                     |Added
                 CC|                            |zan at falconsigh.net

--- Comment #5 from Zan Dobersek <zan at falconsigh.net> ---
I'm using ToT on Debian stretch.

What locale are you using? Specifically, what locale is being passed (via `localeIdentifier`) to lastHyphenLocation?

In relation to that locale, how many dictionaries for that locale are stored at /usr/share/hyphen or /usr/local/share/hyphen?

In my case, I'm using the 'en' locale, meaning that related locales will be mapped via the availableLocales() HashMap to three dictionary files: hyph_en_GB.dic, hyph_en_AU.dic and hyph_en_ZA.dic, as provided by the hyphen-en-gb package.

We're storing these in a TinyLRUCache with a default capacity of 4, meaning that they will fit in just fine.

If I overwrite the locale to be 'es', the Spanish locale, then related locales will be mapped via the availableLocales() HashMap to 21 dictionary files. In lastHyphenLocation(), we iterate over these locales and retrieve a HyphenDictionary object from the TinyLRUCache for each one, until we find a locale that works for us. But because of using the TinyLRUCache with the default capacity of 4, the dictionary objects stored in the TinyLRUCache keep getting on evicted and being replaced with new ones.

With the 'en' locale, loading the Python PEP page, only three HyphenDictionary objects are created, with the underlying hyph_en_*.dic file opened and processed.

With the 'es' locale, loading the same page, 1723 HyphenDictionary objects are created, loading each hyph_es_*.dic file 82 times. It still doesn't spin the CPU usage of the WebProcess to 100% on my system, but it's obviously a problem.

While the TinyLRUCache capacity could be bumped, it should be noted that at least in Debian packages a lot of these locale variations for one specific locale under /usr/share/hyphen are simply symbolic links to that one master dict file. For instance, there's 21 different Spanish locales under /usr/share/hyphen, from Bolivian to Venezuelan, but it's 20 files just linking to the master hyphen_es_ES.dic file.

Same for the English locales -- hyph_en_AU.dic and hyph_en_ZA.dic link to hyph_en_GB.dic.

So we should maybe also look into detecting symlinks when storing these filepaths in the availableLocales() HashMap.

You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20170109/b5e4f8e3/attachment.html>

More information about the webkit-unassigned mailing list