[webkit-dev] DOM tree traversal on detached nodes
Filip Pizlo
fpizlo at apple.com
Tue Jun 12 00:07:25 PDT 2012
On Jun 11, 2012, at 11:35 PM, Ojan Vafai <ojan at chromium.org> wrote:
> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> I think that a lot of the performance penalty can be alleviated by:
>
> 1) Moving rarely executed paths into non-inlineable methods.
>
> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as much as possible out of line.
>
> 3) Using LIKELY and UNLIKELY where appropriate.
>
> We should only add these when we can point to a measurable benefit IMO (maybe that's what you meant by "appropriate"?). My experience is that marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance improvement with modern compilers + x86 processors. I can't speak for arm since I have less direct experience optimizing for arm.
It's more for the benefit of the compiler than the hardware. It can, for example, improve register allocation if the compiler knows that in case of interference between variables used predominantly on cold paths and variables used predominantly on hot paths, it should spill the cold ones. The following is a good example:
x = some expression;
if (--thingy) doStuff(); // not inlineable
use x;
The function call will interfere with all caller-save registers. Which is most registers. The compiler can do one of three things: (1) spill x, (2) put x in callee-save register, of which there is a limited supply, or (3) put x in a caller save but insert save/restore code around the doStuff() call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers unless it knew that doStuff() was called rarely. Hence the utility of using UNLIKEY in this case. If you know that it is unlikely then you want the compiler to go with option (3), which greatly relieves register pressure at the cost of making the rare call slightly slower.
You're right that it's not a huge win or even a measurable win in many cases. The conditions have to be just right - for one it has to be a case where the compiler wasn't already smart enough to do the right thing. And current compilers are quite smart indeed so the benefits of these macros will converge to zero over time.
In my experience it's a benefit in inline functions that get called from many places, since after a lot of inlining the compiler can use all the help it can get in trying to unravel which parts of the code matter and which don't. I've seen it result in a 5% progression in the past, though admittedly it was in runtime functions that get inlined *everywhere* like allocation and GC write barriers.
In this particular case, I have a hunch that these macros might make a difference. It's worth a shot since adding them doesn't require much effort.
>
> The reason I suggest these things is that you're adding a lot of code in a bunch of methods, but a lot of that logic looks like it will execute infrequently. That reduces inlining opportunities. And in places where the methods are still inlined, you're adding code bloat that reduces icache locality and may blow out branch prediction caches.
>
> I'm also not sure that your benchmark is conclusive. What does the perf test have in common with things we know we care about, like say PLT?
>
> -Filip
>
> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <haraken at chromium.org> wrote:
>
> > Thanks ggaren!
> >
> > I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834
> >
> >
> >> I believe you could achieve (a) (i.e., preserve all reachable nodes without help from the JavaScript garbage collector) with these semantics in the C++ DOM:
> >>
> >> ===== Design =====
> >>
> >> ref():
> >>
> >> ++this->refCount
> >> ++root()->refCount
> >>
> >> deref():
> >>
> >> --this->refCount
> >> --root()->refCount
> >
> > Actually I've tried the approach yesterday but confirmed 25.9%
> > performance regression, even if I have TreeShared hold a pointer to
> > the root. Please look at the WIP patch and the performance test in
> > https://bugs.webkit.org/show_bug.cgi?id=88834.
> >
> > What I've learned is that we must not insert any line to ref() and
> > deref(), which are performance sensitive:)
> >
> > I am now trying another approach that hacks Node destruction. Let's
> > discuss the implementation details in the bug.
> >
> > Thanks!
> >
> >
> > On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <ggaren at apple.com> wrote:
> >>
> >> IMHO, (a) would be the best semantics.
> >>
> >>
> >> I agree, and I think that the specification actually does require this.
> >>
> >> The real issue here is how to fix this bug efficiently. I believe that Geoff Garen has been contemplating this and also has a proposal for how to do it.
> >>
> >>
> >> I believe you could achieve (a) (i.e., preserve all reachable nodes without help from the JavaScript garbage collector) with these semantics in the C++ DOM:
> >>
> >> ===== Design =====
> >>
> >> ref():
> >>
> >> ++this->refCount
> >> ++root()->refCount
> >>
> >> deref():
> >>
> >> --this->refCount
> >> --root()->refCount
> >>
> >> if !root()->refCount:
> >> assert(!this->refCount)
> >> for each descendent of root():
> >> delete descendent
> >> delete root()
> >>
> >> Remove 'this' from tree:
> >>
> >> assert(this->refCount) // clients must put a node into a RefPtr to remove it from a tree
> >>
> >> root()->refCount -= this->refCount
> >>
> >> for each descendent of this:
> >> this->refCount += descendent->refCount
> >>
> >> Insert 'this' into tree:
> >>
> >> root()->refCount += this->refCount
> >>
> >> for each descendent of this:
> >> this->refCount -= descendent->refCount
> >>
> >> What is "root()"?:
> >>
> >> If this is in a document:
> >> root() == document()
> >>
> >> else:
> >> root() == the highest link in the parentNode chain
> >>
> >>
> >> ===== FAQ =====
> >>
> >> Cycles?
> >>
> >> No cycles are possible because the DOM API does not allow cycles in a DOM tree.
> >>
> >> Is O(n) subtree insertion and removal OK?
> >>
> >> I believe these operations are already O(n).
> >>
> >> Is average case O(log n) access to disconnected root() OK?
> >>
> >> If n is small and/or ref/deref of disconnected subnodes are rare, then yes.
> >>
> >> Otherwise, we can optimize this case by putting a direct pointer to root() in rare data.
> >>
> >> Geoff
> >
> >
> >
> > --
> > Kentaro Hara, Tokyo, Japan (http://haraken.info)
> > _______________________________________________
> > webkit-dev mailing list
> > webkit-dev at lists.webkit.org
> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120612/c0c685f8/attachment-0001.html>
More information about the webkit-dev
mailing list