[Webkit-unassigned] [Bug 24986] ARM JIT port

Thu May 7 01:35:21 PDT 2009

https://bugs.webkit.org/show_bug.cgi?id=24986

------- Comment #13 from oszi at inf.u-szeged.hu  2009-05-07 01:35 PDT -------
There was a short discussion earlier why we do not use MacroAssembler for
generating JIT code. Although our position did not change – i.e., we
still
think that JIT has to be as fast as possible – we have implemented WREC
for
ARM using MacroAssembler. You can check the code at:
http://code.staikos.net/cgi-bin/gitweb.cgi?p=webkit;a=shortlog;h=ossy/arm-port-MacroAssember_WREC 

Here is the result. All values are compared to our ARM port, where WREC
generates hand-optimized JIT code.

Performance:
SunSpider is almost the same (~1.01x as slow).
RexExpDNA is 1.05x as slow.

MacroAssembler-based WREC generates 20% bigger JIT code. 
'JSC' binary size is 2% bigger.
RSS is almost the same (~ 0.5% bigger).

Well, the numbers say that the MacroAssembler-based WREC is both slower and
bigger than the hand-optimized one. The results definitely come from the
pattern-driven machine code generation. Here is an example (from
'generatePatternCharacterPair'):

using MacroAssembler:
        ...
        int pair = ch1 | (ch2 << 16);
        failures.append(branch32(NotEqual, character, Imm32(pair)));
        ...
        mov     r0, #97 ; 0x61
        orr     r0, r0, #6750208
        cmp     r2, r0
        bne     0x426be0bc

hand-optimized:
        ...
        if ((ch1 >> 8) || (ch2 >> 8)) {
                ...
        } else {
                m_assembler.sub_r(tmpReg, character, m_assembler.imm(ch2
<<16));
                m_assembler.cmp_r(tmpReg, m_assembler.imm(ch1));
        }
        failures.append(m_compiler.emitBranch(&m_assembler, ARMAssembler::NE));
        ...
        sub     r0, r2, #6750208
        cmp     r0, #97
        bne     0x426be0bc

This is a typical example how hard is to optimize inside a pattern. While
MacroAssembler works with an 'Imm32' data that hides an optimization
opportunity, the hand-optimized version can use the fast case (because initial 
conditions are well known at that point). Well, this optimization could be
implemented in ' branch32' (splitting the input, checking the upper bytes,
etc.), but this kind of 'Imm32' data occurs mostly in a RegExp. So, in other
cases this would lead to JIT generation overhead.

Implementing MacroAssembler-based WREC was not a hard task because WREC uses
very simple JIT constructions. In contrast, the SquirrelFish bytecode uses more
diversified JIT constructions (like calls, loops, math operators, etc.) . 
Although in WREC there is very little space for optimizing the machine code, 
the performance gain of the hand-written JIT is about 5%. What would be then
the difference between a MacroAssembler-based and the hand-optimized version of
the
JavaScript JIT?

We see that the uniform JIT is pretty and well maintainable, but is this a good
reason not to go for a 1, 2, 5, or 10% performance gain?

Regards,
Gabor and Csaba

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.