[webkit-dev] DOM tree traversal on detached nodes

Wed Jun 27 10:39:31 PDT 2012

>From your design document, it sounds like this approach retains guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner document's children alive. Am I understanding correctly?

 - Maciej

On Jun 27, 2012, at 5:55 AM, Kentaro Hara <haraken at chromium.org> wrote:

> I wrote a document and implemented a patch of a new reference counting
> algorithm. The reference counting algorithm guarantees that "If a Node
> X has 1~ reference count, then all the Nodes in the same tree are kept
> alive". No performance regression. No additional byte in each Node
> object.
> 
> A design document:
> https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1
> 
> A patch: https://bugs.webkit.org/show_bug.cgi?id=88834
> 
> Comments are appreciated! (You can add a comment on the document side
> by side, or post a comment to bug 88834.)
> 
> Thanks!
> 
> 
> On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> 
>> On Jun 11, 2012, at 11:35 PM, Ojan Vafai <ojan at chromium.org> wrote:
>> 
>> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>> 
>>> I think that a lot of the performance penalty can be alleviated by:
>>> 
>>> 1) Moving rarely executed paths into non-inlineable methods.
>>> 
>>> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as
>>> much as possible out of line.
>>> 
>>> 3) Using LIKELY and UNLIKELY where appropriate.
>> 
>> 
>> We should only add these when we can point to a measurable benefit IMO
>> (maybe that's what you meant by "appropriate"?). My experience is that
>> marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance
>> improvement with modern compilers + x86 processors. I can't speak for arm
>> since I have less direct experience optimizing for arm.
>> 
>> 
>> It's more for the benefit of the compiler than the hardware. It can, for
>> example, improve register allocation if the compiler knows that in case of
>> interference between variables used predominantly on cold paths and
>> variables used predominantly on hot paths, it should spill the cold ones.
>> The following is a good example:
>> 
>> x = some expression;
>> if (--thingy) doStuff(); // not inlineable
>> use x;
>> 
>> The function call will interfere with all caller-save registers. Which is
>> most registers. The compiler can do one of three things: (1) spill x, (2)
>> put x in callee-save register, of which there is a limited supply, or (3)
>> put x in a caller save but insert save/restore code around the doStuff()
>> call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers
>> unless it knew that doStuff() was called rarely. Hence the utility of using
>> UNLIKEY in this case. If you know that it is unlikely then you want the
>> compiler to go with option (3), which greatly relieves register pressure at
>> the cost of making the rare call slightly slower.
>> 
>> You're right that it's not a huge win or even a measurable win in many
>> cases.  The conditions have to be just right - for one it has to be a case
>> where the compiler wasn't already smart enough to do the right thing.  And
>> current compilers are quite smart indeed so the benefits of these macros
>> will converge to zero over time.
>> 
>> In my experience it's a benefit in inline functions that get called from
>> many places, since after a lot of inlining the compiler can use all the help
>> it can get in trying to unravel which parts of the code matter and which
>> don't. I've seen it result in a 5% progression in the past, though
>> admittedly it was in runtime functions that get inlined *everywhere* like
>> allocation and GC write barriers.
>> 
>> In this particular case, I have a hunch that these macros might make a
>> difference. It's worth a shot since adding them doesn't require much
>> effort.
>> 
>> 
>>> The reason I suggest these things is that you're adding a lot of code in a
>>> bunch of methods, but a lot of that logic looks like it will execute
>>> infrequently. That reduces inlining opportunities. And in places where the
>>> methods are still inlined, you're adding code bloat that reduces icache
>>> locality and may blow out branch prediction caches.
>>> 
>>> I'm also not sure that your benchmark is conclusive. What does the perf
>>> test have in common with things we know we care about, like say PLT?
>>> 
>>> -Filip
>>> 
>>> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <haraken at chromium.org> wrote:
>>> 
>>>> Thanks ggaren!
>>>> 
>>>> I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834
>>>> 
>>>> 
>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes
>>>>> without help from the JavaScript garbage collector) with these semantics in
>>>>> the C++ DOM:
>>>>> 
>>>>> ===== Design =====
>>>>> 
>>>>> ref():
>>>>> 
>>>>> ++this->refCount
>>>>> ++root()->refCount
>>>>> 
>>>>> deref():
>>>>> 
>>>>> --this->refCount
>>>>> --root()->refCount
>>>> 
>>>> Actually I've tried the approach yesterday but confirmed 25.9%
>>>> performance regression, even if I have TreeShared hold a pointer to
>>>> the root. Please look at the WIP patch and the performance test in
>>>> https://bugs.webkit.org/show_bug.cgi?id=88834.
>>>> 
>>>> What I've learned is that we must not insert any line to ref() and
>>>> deref(), which are performance sensitive:)
>>>> 
>>>> I am now trying another approach that hacks Node destruction. Let's
>>>> discuss the implementation details in the bug.
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <ggaren at apple.com>
>>>> wrote:
>>>>> 
>>>>> IMHO, (a) would be the best semantics.
>>>>> 
>>>>> 
>>>>> I agree, and I think that the specification actually does require this.
>>>>> 
>>>>> The real issue here is how to fix this bug efficiently. I believe that
>>>>> Geoff Garen has been contemplating this and also has a proposal for how to
>>>>> do it.
>>>>> 
>>>>> 
>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes
>>>>> without help from the JavaScript garbage collector) with these semantics in
>>>>> the C++ DOM:
>>>>> 
>>>>> ===== Design =====
>>>>> 
>>>>> ref():
>>>>> 
>>>>> ++this->refCount
>>>>> ++root()->refCount
>>>>> 
>>>>> deref():
>>>>> 
>>>>> --this->refCount
>>>>> --root()->refCount
>>>>> 
>>>>> if !root()->refCount:
>>>>> assert(!this->refCount)
>>>>> for each descendent of root():
>>>>> delete descendent
>>>>> delete root()
>>>>> 
>>>>> Remove 'this' from tree:
>>>>> 
>>>>> assert(this->refCount) // clients must put a node into a RefPtr to
>>>>> remove it from a tree
>>>>> 
>>>>> root()->refCount -= this->refCount
>>>>> 
>>>>> for each descendent of this:
>>>>> this->refCount += descendent->refCount
>>>>> 
>>>>> Insert 'this' into tree:
>>>>> 
>>>>> root()->refCount += this->refCount
>>>>> 
>>>>> for each descendent of this:
>>>>> this->refCount -= descendent->refCount
>>>>> 
>>>>> What is "root()"?:
>>>>> 
>>>>> If this is in a document:
>>>>> root() == document()
>>>>> 
>>>>> else:
>>>>> root() == the highest link in the parentNode chain
>>>>> 
>>>>> 
>>>>> ===== FAQ =====
>>>>> 
>>>>> Cycles?
>>>>> 
>>>>> No cycles are possible because the DOM API does not allow cycles in a
>>>>> DOM tree.
>>>>> 
>>>>> Is O(n) subtree insertion and removal OK?
>>>>> 
>>>>> I believe these operations are already O(n).
>>>>> 
>>>>> Is average case O(log n) access to disconnected root() OK?
>>>>> 
>>>>> If n is small and/or ref/deref of disconnected subnodes are rare, then
>>>>> yes.
>>>>> 
>>>>> Otherwise, we can optimize this case by putting a direct pointer to
>>>>> root() in rare data.
>>>>> 
>>>>> Geoff
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kentaro Hara, Tokyo, Japan (http://haraken.info)
>>>> _______________________________________________
>>>> webkit-dev mailing list
>>>> webkit-dev at lists.webkit.org
>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>> 
>> 
> 
> 
> 
> -- 
> Kentaro Hara, Tokyo, Japan (http://haraken.info)
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev