[Webkit-unassigned] [Bug 228858] New: white space atomization during parsing is expensive
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Thu Aug 5 22:41:56 PDT 2021
https://bugs.webkit.org/show_bug.cgi?id=228858
Bug ID: 228858
Summary: white space atomization during parsing is expensive
Product: WebKit
Version: WebKit Local Build
Hardware: Unspecified
OS: Unspecified
Status: NEW
Severity: Normal
Priority: P2
Component: DOM
Assignee: webkit-unassigned at lists.webkit.org
Reporter: heycam at apple.com
CC: webkit-bug-importer at group.apple.com
We have a memory optimization where the HTML parser will atomize any text node string that is all white space. The process for this is a bit expensive, since we must loop over all the characters in the string three times:
* First, to check that all the characters are white space
* Second, to hash the string when looking up the atom hash table
* Third, to check the string for equality with any existing atom hash table entry
Most white space strings we encounter have a limited form -- they have at most three or four runs of consecutive equal white space characters, e.g. it's common to see a newline followed by a number of space characters. We can take advantage of this by compressing the white space string into a simple run-length encoded form while we check that the string is entirely white space. If we keep a cache of recently atomized white space strings that can be quickly looked up, keyed off the encoded form, we can re-use a previous result of atomizing an identical string and avoid the hashing and hash entry equality checks.
I have a WIP patch for this that is showing a 1% improvement on Speedometer 2 overall (due to 2-4% improvements on a few of the subtests) and no change to PLT5 on my local machine.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20210806/1a7c51b2/attachment.htm>
More information about the webkit-unassigned
mailing list