[Webkit-unassigned] [Bug 24986] ARM JIT port
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Thu May 7 01:35:21 PDT 2009
https://bugs.webkit.org/show_bug.cgi?id=24986
------- Comment #13 from oszi at inf.u-szeged.hu 2009-05-07 01:35 PDT -------
There was a short discussion earlier why we do not use MacroAssembler for
generating JIT code. Although our position did not change – i.e., we
still
think that JIT has to be as fast as possible – we have implemented WREC
for
ARM using MacroAssembler. You can check the code at:
http://code.staikos.net/cgi-bin/gitweb.cgi?p=webkit;a=shortlog;h=ossy/arm-port-MacroAssember_WREC
Here is the result. All values are compared to our ARM port, where WREC
generates hand-optimized JIT code.
Performance:
SunSpider is almost the same (~1.01x as slow).
RexExpDNA is 1.05x as slow.
MacroAssembler-based WREC generates 20% bigger JIT code.
'JSC' binary size is 2% bigger.
RSS is almost the same (~ 0.5% bigger).
Well, the numbers say that the MacroAssembler-based WREC is both slower and
bigger than the hand-optimized one. The results definitely come from the
pattern-driven machine code generation. Here is an example (from
'generatePatternCharacterPair'):
using MacroAssembler:
...
int pair = ch1 | (ch2 << 16);
failures.append(branch32(NotEqual, character, Imm32(pair)));
...
mov r0, #97 ; 0x61
orr r0, r0, #6750208
cmp r2, r0
bne 0x426be0bc
hand-optimized:
...
if ((ch1 >> 8) || (ch2 >> 8)) {
...
} else {
m_assembler.sub_r(tmpReg, character, m_assembler.imm(ch2
<<16));
m_assembler.cmp_r(tmpReg, m_assembler.imm(ch1));
}
failures.append(m_compiler.emitBranch(&m_assembler, ARMAssembler::NE));
...
sub r0, r2, #6750208
cmp r0, #97
bne 0x426be0bc
This is a typical example how hard is to optimize inside a pattern. While
MacroAssembler works with an 'Imm32' data that hides an optimization
opportunity, the hand-optimized version can use the fast case (because initial
conditions are well known at that point). Well, this optimization could be
implemented in ' branch32' (splitting the input, checking the upper bytes,
etc.), but this kind of 'Imm32' data occurs mostly in a RegExp. So, in other
cases this would lead to JIT generation overhead.
Implementing MacroAssembler-based WREC was not a hard task because WREC uses
very simple JIT constructions. In contrast, the SquirrelFish bytecode uses more
diversified JIT constructions (like calls, loops, math operators, etc.) .
Although in WREC there is very little space for optimizing the machine code,
the performance gain of the hand-written JIT is about 5%. What would be then
the difference between a MacroAssembler-based and the hand-optimized version of
the
JavaScript JIT?
We see that the uniform JIT is pretty and well maintainable, but is this a good
reason not to go for a 1, 2, 5, or 10% performance gain?
Regards,
Gabor and Csaba
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the webkit-unassigned
mailing list