[Webkit-unassigned] [Bug 98694] [Performance] Speed-up DOM tree traversal on ARM platforms

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Oct 19 17:00:08 PDT 2012


https://bugs.webkit.org/show_bug.cgi?id=98694





--- Comment #21 from Benjamin Poulain <benjamin at webkit.org>  2012-10-19 17:01:05 PST ---
(From update of attachment 169667)
View in context: https://bugs.webkit.org/attachment.cgi?id=169667&action=review

I am unconvinced. I think there is too much tradeoff.
I also think it should be by implementation at first, not just CPU(ARM).

> Source/WTF/ChangeLog:3
> +        [Performance] Speed-up DOM tree traversal on ARM platforms

The title needs to be updated, you are now also enabling this on x86 and x86_64.

> Source/WTF/wtf/Platform.h:1039
> +/* Prefetch to minimize cache-miss latency by moving DOM nodes into CPU cache before they are accessed.
> +   Since ARM & x86 seem to benefit most out of this optimizaiton, this is enabled only on those platforms.
> +   Note: This optimization is highly sensitive to CPU micro architecture design choises such as
> +   L0/L1/L2 cache size & latency, L0/L1/L2 cache line size, PLD and external memory (DDR) bandwidth/latency. */
> +#if !defined(ENABLE_PLD_DOM_OPTIMIZATION) && (CPU(X86) || CPU(X86_64) || CPU(ARM)) && COMPILER_SUPPORTS(BUILTIN_PREFETCH)
> +#define ENABLE_PLD_DOM_OPTIMIZATION 1
> +#endif

Such definition would be more suited inside WebCore than in Platform.h. You could probably move it to Node.h.
The name of the macro contains "PLD", which is an ARM instruction. Why not use "Prefetch" like for the builtin?

Typo: optimizaiton.
No need to say "(DDR)" for the memory.

> Source/WebCore/ChangeLog:25
> +        and external memory (DDR) latency.

"(DDR)" here is unnecessary.

> Source/WebCore/ChangeLog:41
> +        CSS Selector Tests (Total Score)        +16% (ARM), +11% (x86)
> +        Bindings/event-target-wrapper           -11% (ARM), -10% (x86)
> +        Bindings/undefined-get-element-by-id    -10% (ARM), -08% (x86)
> +        Bindings/undefined-id-getter            +01% (ARM), +14% (x86)
> +        DOM/Accessors                           +10% (ARM), +06% (x86)
> +        DOM/GetElement                          -09% (ARM), -10% (x86)
> +        DOM/CloneNodes                          -01% (ARM), -07% (x86)
> +        DOM/DOMTable                            +06% (ARM), +05% (x86)
> +        Dromaeo/dom-attr                        +05% (ARM), +03% (x86)
> +        Dromaeo/dom-modify                      -04% (ARM), -06% (x86)
> +        Dromaeo/jslib-attr-prototype            -00% (ARM), -05% (x86)
> +        Dromaeo/jslib-style-prototype           +04% (ARM), +19% (x86)
> +        CSS/StyleSheetInsert                    +31% (ARM), +07% (x86)
> +        Parser/query-selector-last              +09% (ARM), -04% (x86)

I quickly tried the patch on a ARMv7 core, here are my results:
-DOM/Accessors: -1.7%
-DOM/GetElement: -3.3%
-Dromaeo dom-attr: +2.8%

> Source/WebCore/dom/Node.h:108
> +const int prefetchStride = 7;

This should be unsigned.

> Source/WebCore/dom/Node.h:918
> +    for (int i = prefetchStride; i && previous; i--) {

This should be unsigned.

> Source/WebCore/dom/Node.h:934
> +    // Note: Since passing dangling/bad pointer to PLD results in NOP, it is ok to have
> +    // bad pointer in m_cpuCachePreloadTarget.

This is true for PLD, what about x86 prefetch?

> Source/WebCore/dom/Node.h:936
> +    if (LIKELY(m_cpuCachePreloadTarget != 0))

I am still unconvinced by this. :)

> Source/WebCore/dom/Node.h:937
> +        __builtin_prefetch((char *)(m_cpuCachePreloadTarget));

Use a C++ cast instead of a C cast.
__builtin_prefetch takes a void*

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list