[Webkit-unassigned] [Bug 228858] New: white space atomization during parsing is expensive

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Aug 5 22:41:56 PDT 2021


https://bugs.webkit.org/show_bug.cgi?id=228858

            Bug ID: 228858
           Summary: white space atomization during parsing is expensive
           Product: WebKit
           Version: WebKit Local Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: DOM
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: heycam at apple.com
                CC: webkit-bug-importer at group.apple.com

We have a memory optimization where the HTML parser will atomize any text node string that is all white space.  The process for this is a bit expensive, since we must loop over all the characters in the string three times:

* First, to check that all the characters are white space
* Second, to hash the string when looking up the atom hash table
* Third, to check the string for equality with any existing atom hash table entry

Most white space strings we encounter have a limited form -- they have at most three or four runs of consecutive equal white space characters, e.g. it's common to see a newline followed by a number of space characters.  We can take advantage of this by compressing the white space string into a simple run-length encoded form while we check that the string is entirely white space.  If we keep a cache of recently atomized white space strings that can be quickly looked up, keyed off the encoded form, we can re-use a previous result of atomizing an identical string and avoid the hashing and hash entry equality checks.

I have a WIP patch for this that is showing a 1% improvement on Speedometer 2 overall (due to 2-4% improvements on a few of the subtests) and no change to PLT5 on my local machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20210806/1a7c51b2/attachment.htm>


More information about the webkit-unassigned mailing list