[webkit-dev] DOM tree traversal on detached nodes

Kentaro Hara haraken at chromium.org
Wed Jun 27 05:55:08 PDT 2012


I wrote a document and implemented a patch of a new reference counting
algorithm. The reference counting algorithm guarantees that "If a Node
X has 1~ reference count, then all the Nodes in the same tree are kept
alive". No performance regression. No additional byte in each Node
object.

A design document:
https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1

A patch: https://bugs.webkit.org/show_bug.cgi?id=88834

Comments are appreciated! (You can add a comment on the document side
by side, or post a comment to bug 88834.)

Thanks!


On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>
> On Jun 11, 2012, at 11:35 PM, Ojan Vafai <ojan at chromium.org> wrote:
>
> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>
>> I think that a lot of the performance penalty can be alleviated by:
>>
>> 1) Moving rarely executed paths into non-inlineable methods.
>>
>> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as
>> much as possible out of line.
>>
>> 3) Using LIKELY and UNLIKELY where appropriate.
>
>
> We should only add these when we can point to a measurable benefit IMO
> (maybe that's what you meant by "appropriate"?). My experience is that
> marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance
> improvement with modern compilers + x86 processors. I can't speak for arm
> since I have less direct experience optimizing for arm.
>
>
> It's more for the benefit of the compiler than the hardware. It can, for
> example, improve register allocation if the compiler knows that in case of
> interference between variables used predominantly on cold paths and
> variables used predominantly on hot paths, it should spill the cold ones.
> The following is a good example:
>
> x = some expression;
> if (--thingy) doStuff(); // not inlineable
> use x;
>
> The function call will interfere with all caller-save registers. Which is
> most registers. The compiler can do one of three things: (1) spill x, (2)
> put x in callee-save register, of which there is a limited supply, or (3)
> put x in a caller save but insert save/restore code around the doStuff()
> call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers
> unless it knew that doStuff() was called rarely. Hence the utility of using
> UNLIKEY in this case. If you know that it is unlikely then you want the
> compiler to go with option (3), which greatly relieves register pressure at
> the cost of making the rare call slightly slower.
>
> You're right that it's not a huge win or even a measurable win in many
> cases.  The conditions have to be just right - for one it has to be a case
> where the compiler wasn't already smart enough to do the right thing.  And
> current compilers are quite smart indeed so the benefits of these macros
> will converge to zero over time.
>
> In my experience it's a benefit in inline functions that get called from
> many places, since after a lot of inlining the compiler can use all the help
> it can get in trying to unravel which parts of the code matter and which
> don't. I've seen it result in a 5% progression in the past, though
> admittedly it was in runtime functions that get inlined *everywhere* like
> allocation and GC write barriers.
>
> In this particular case, I have a hunch that these macros might make a
> difference. It's worth a shot since adding them doesn't require much
> effort.
>
>
>> The reason I suggest these things is that you're adding a lot of code in a
>> bunch of methods, but a lot of that logic looks like it will execute
>> infrequently. That reduces inlining opportunities. And in places where the
>> methods are still inlined, you're adding code bloat that reduces icache
>> locality and may blow out branch prediction caches.
>>
>> I'm also not sure that your benchmark is conclusive. What does the perf
>> test have in common with things we know we care about, like say PLT?
>>
>> -Filip
>>
>> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <haraken at chromium.org> wrote:
>>
>> > Thanks ggaren!
>> >
>> > I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834
>> >
>> >
>> >> I believe you could achieve (a) (i.e., preserve all reachable nodes
>> >> without help from the JavaScript garbage collector) with these semantics in
>> >> the C++ DOM:
>> >>
>> >> ===== Design =====
>> >>
>> >> ref():
>> >>
>> >> ++this->refCount
>> >> ++root()->refCount
>> >>
>> >> deref():
>> >>
>> >> --this->refCount
>> >> --root()->refCount
>> >
>> > Actually I've tried the approach yesterday but confirmed 25.9%
>> > performance regression, even if I have TreeShared hold a pointer to
>> > the root. Please look at the WIP patch and the performance test in
>> > https://bugs.webkit.org/show_bug.cgi?id=88834.
>> >
>> > What I've learned is that we must not insert any line to ref() and
>> > deref(), which are performance sensitive:)
>> >
>> > I am now trying another approach that hacks Node destruction. Let's
>> > discuss the implementation details in the bug.
>> >
>> > Thanks!
>> >
>> >
>> > On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <ggaren at apple.com>
>> > wrote:
>> >>
>> >> IMHO, (a) would be the best semantics.
>> >>
>> >>
>> >> I agree, and I think that the specification actually does require this.
>> >>
>> >> The real issue here is how to fix this bug efficiently. I believe that
>> >> Geoff Garen has been contemplating this and also has a proposal for how to
>> >> do it.
>> >>
>> >>
>> >> I believe you could achieve (a) (i.e., preserve all reachable nodes
>> >> without help from the JavaScript garbage collector) with these semantics in
>> >> the C++ DOM:
>> >>
>> >> ===== Design =====
>> >>
>> >> ref():
>> >>
>> >> ++this->refCount
>> >> ++root()->refCount
>> >>
>> >> deref():
>> >>
>> >> --this->refCount
>> >> --root()->refCount
>> >>
>> >> if !root()->refCount:
>> >> assert(!this->refCount)
>> >> for each descendent of root():
>> >> delete descendent
>> >> delete root()
>> >>
>> >> Remove 'this' from tree:
>> >>
>> >> assert(this->refCount) // clients must put a node into a RefPtr to
>> >> remove it from a tree
>> >>
>> >> root()->refCount -= this->refCount
>> >>
>> >> for each descendent of this:
>> >> this->refCount += descendent->refCount
>> >>
>> >> Insert 'this' into tree:
>> >>
>> >> root()->refCount += this->refCount
>> >>
>> >> for each descendent of this:
>> >> this->refCount -= descendent->refCount
>> >>
>> >> What is "root()"?:
>> >>
>> >> If this is in a document:
>> >> root() == document()
>> >>
>> >> else:
>> >> root() == the highest link in the parentNode chain
>> >>
>> >>
>> >> ===== FAQ =====
>> >>
>> >> Cycles?
>> >>
>> >> No cycles are possible because the DOM API does not allow cycles in a
>> >> DOM tree.
>> >>
>> >> Is O(n) subtree insertion and removal OK?
>> >>
>> >> I believe these operations are already O(n).
>> >>
>> >> Is average case O(log n) access to disconnected root() OK?
>> >>
>> >> If n is small and/or ref/deref of disconnected subnodes are rare, then
>> >> yes.
>> >>
>> >> Otherwise, we can optimize this case by putting a direct pointer to
>> >> root() in rare data.
>> >>
>> >> Geoff
>> >
>> >
>> >
>> > --
>> > Kentaro Hara, Tokyo, Japan (http://haraken.info)
>> > _______________________________________________
>> > webkit-dev mailing list
>> > webkit-dev at lists.webkit.org
>> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>



-- 
Kentaro Hara, Tokyo, Japan (http://haraken.info)


More information about the webkit-dev mailing list