[webkit-dev] DOM tree traversal on detached nodes

Kentaro Hara haraken at chromium.org
Wed Jun 27 17:38:04 PDT 2012


> From your design document, it sounds like this approach retains guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner document's children alive. Am I understanding correctly?

Yes!

- The approach retains guardRef/selfOnlyRef.

- Assume that someone refers to a document X as an owner document. If
all X's children are 0-ref nodes, then all the X's children are
reclaimed and only X is kept alive. Otherwise (i.e. there is any n-ref
node in the X's children), all the X's children and X are kept alive.


On Thu, Jun 28, 2012 at 2:39 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>
> From your design document, it sounds like this approach retains guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner document's children alive. Am I understanding correctly?
>
>  - Maciej
>
> On Jun 27, 2012, at 5:55 AM, Kentaro Hara <haraken at chromium.org> wrote:
>
>> I wrote a document and implemented a patch of a new reference counting
>> algorithm. The reference counting algorithm guarantees that "If a Node
>> X has 1~ reference count, then all the Nodes in the same tree are kept
>> alive". No performance regression. No additional byte in each Node
>> object.
>>
>> A design document:
>> https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1
>>
>> A patch: https://bugs.webkit.org/show_bug.cgi?id=88834
>>
>> Comments are appreciated! (You can add a comment on the document side
>> by side, or post a comment to bug 88834.)
>>
>> Thanks!
>>
>>
>> On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>
>>> On Jun 11, 2012, at 11:35 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>
>>> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>>
>>>> I think that a lot of the performance penalty can be alleviated by:
>>>>
>>>> 1) Moving rarely executed paths into non-inlineable methods.
>>>>
>>>> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as
>>>> much as possible out of line.
>>>>
>>>> 3) Using LIKELY and UNLIKELY where appropriate.
>>>
>>>
>>> We should only add these when we can point to a measurable benefit IMO
>>> (maybe that's what you meant by "appropriate"?). My experience is that
>>> marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance
>>> improvement with modern compilers + x86 processors. I can't speak for arm
>>> since I have less direct experience optimizing for arm.
>>>
>>>
>>> It's more for the benefit of the compiler than the hardware. It can, for
>>> example, improve register allocation if the compiler knows that in case of
>>> interference between variables used predominantly on cold paths and
>>> variables used predominantly on hot paths, it should spill the cold ones.
>>> The following is a good example:
>>>
>>> x = some expression;
>>> if (--thingy) doStuff(); // not inlineable
>>> use x;
>>>
>>> The function call will interfere with all caller-save registers. Which is
>>> most registers. The compiler can do one of three things: (1) spill x, (2)
>>> put x in callee-save register, of which there is a limited supply, or (3)
>>> put x in a caller save but insert save/restore code around the doStuff()
>>> call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers
>>> unless it knew that doStuff() was called rarely. Hence the utility of using
>>> UNLIKEY in this case. If you know that it is unlikely then you want the
>>> compiler to go with option (3), which greatly relieves register pressure at
>>> the cost of making the rare call slightly slower.
>>>
>>> You're right that it's not a huge win or even a measurable win in many
>>> cases.  The conditions have to be just right - for one it has to be a case
>>> where the compiler wasn't already smart enough to do the right thing.  And
>>> current compilers are quite smart indeed so the benefits of these macros
>>> will converge to zero over time.
>>>
>>> In my experience it's a benefit in inline functions that get called from
>>> many places, since after a lot of inlining the compiler can use all the help
>>> it can get in trying to unravel which parts of the code matter and which
>>> don't. I've seen it result in a 5% progression in the past, though
>>> admittedly it was in runtime functions that get inlined *everywhere* like
>>> allocation and GC write barriers.
>>>
>>> In this particular case, I have a hunch that these macros might make a
>>> difference. It's worth a shot since adding them doesn't require much
>>> effort.
>>>
>>>
>>>> The reason I suggest these things is that you're adding a lot of code in a
>>>> bunch of methods, but a lot of that logic looks like it will execute
>>>> infrequently. That reduces inlining opportunities. And in places where the
>>>> methods are still inlined, you're adding code bloat that reduces icache
>>>> locality and may blow out branch prediction caches.
>>>>
>>>> I'm also not sure that your benchmark is conclusive. What does the perf
>>>> test have in common with things we know we care about, like say PLT?
>>>>
>>>> -Filip
>>>>
>>>> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <haraken at chromium.org> wrote:
>>>>
>>>>> Thanks ggaren!
>>>>>
>>>>> I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834
>>>>>
>>>>>
>>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes
>>>>>> without help from the JavaScript garbage collector) with these semantics in
>>>>>> the C++ DOM:
>>>>>>
>>>>>> ===== Design =====
>>>>>>
>>>>>> ref():
>>>>>>
>>>>>> ++this->refCount
>>>>>> ++root()->refCount
>>>>>>
>>>>>> deref():
>>>>>>
>>>>>> --this->refCount
>>>>>> --root()->refCount
>>>>>
>>>>> Actually I've tried the approach yesterday but confirmed 25.9%
>>>>> performance regression, even if I have TreeShared hold a pointer to
>>>>> the root. Please look at the WIP patch and the performance test in
>>>>> https://bugs.webkit.org/show_bug.cgi?id=88834.
>>>>>
>>>>> What I've learned is that we must not insert any line to ref() and
>>>>> deref(), which are performance sensitive:)
>>>>>
>>>>> I am now trying another approach that hacks Node destruction. Let's
>>>>> discuss the implementation details in the bug.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <ggaren at apple.com>
>>>>> wrote:
>>>>>>
>>>>>> IMHO, (a) would be the best semantics.
>>>>>>
>>>>>>
>>>>>> I agree, and I think that the specification actually does require this.
>>>>>>
>>>>>> The real issue here is how to fix this bug efficiently. I believe that
>>>>>> Geoff Garen has been contemplating this and also has a proposal for how to
>>>>>> do it.
>>>>>>
>>>>>>
>>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes
>>>>>> without help from the JavaScript garbage collector) with these semantics in
>>>>>> the C++ DOM:
>>>>>>
>>>>>> ===== Design =====
>>>>>>
>>>>>> ref():
>>>>>>
>>>>>> ++this->refCount
>>>>>> ++root()->refCount
>>>>>>
>>>>>> deref():
>>>>>>
>>>>>> --this->refCount
>>>>>> --root()->refCount
>>>>>>
>>>>>> if !root()->refCount:
>>>>>> assert(!this->refCount)
>>>>>> for each descendent of root():
>>>>>> delete descendent
>>>>>> delete root()
>>>>>>
>>>>>> Remove 'this' from tree:
>>>>>>
>>>>>> assert(this->refCount) // clients must put a node into a RefPtr to
>>>>>> remove it from a tree
>>>>>>
>>>>>> root()->refCount -= this->refCount
>>>>>>
>>>>>> for each descendent of this:
>>>>>> this->refCount += descendent->refCount
>>>>>>
>>>>>> Insert 'this' into tree:
>>>>>>
>>>>>> root()->refCount += this->refCount
>>>>>>
>>>>>> for each descendent of this:
>>>>>> this->refCount -= descendent->refCount
>>>>>>
>>>>>> What is "root()"?:
>>>>>>
>>>>>> If this is in a document:
>>>>>> root() == document()
>>>>>>
>>>>>> else:
>>>>>> root() == the highest link in the parentNode chain
>>>>>>
>>>>>>
>>>>>> ===== FAQ =====
>>>>>>
>>>>>> Cycles?
>>>>>>
>>>>>> No cycles are possible because the DOM API does not allow cycles in a
>>>>>> DOM tree.
>>>>>>
>>>>>> Is O(n) subtree insertion and removal OK?
>>>>>>
>>>>>> I believe these operations are already O(n).
>>>>>>
>>>>>> Is average case O(log n) access to disconnected root() OK?
>>>>>>
>>>>>> If n is small and/or ref/deref of disconnected subnodes are rare, then
>>>>>> yes.
>>>>>>
>>>>>> Otherwise, we can optimize this case by putting a direct pointer to
>>>>>> root() in rare data.
>>>>>>
>>>>>> Geoff
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kentaro Hara, Tokyo, Japan (http://haraken.info)
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> webkit-dev at lists.webkit.org
>>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>> _______________________________________________
>>>> webkit-dev mailing list
>>>> webkit-dev at lists.webkit.org
>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>
>>>
>>
>>
>>
>> --
>> Kentaro Hara, Tokyo, Japan (http://haraken.info)
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>



-- 
Kentaro Hara, Tokyo, Japan (http://haraken.info)


More information about the webkit-dev mailing list