[webkit-dev] arm jit

Wed Jun 10 13:15:30 PDT 2009

--- On Wed, 6/10/09, Geoffrey Garen <ggaren at apple.com> wrote:

>I'm having a hard time understanding from your comment what optimization
changes you think are appropriate, but if you can produce a patch that
implements 

> your idea, and shows a benefit on a benchmark, I'd be happy
to review it.

Consider something like op_call.

This expands out to 95 inline instructions on the MIPS for just the
slow case alone, of which 3 are functions calls to other functions. So
this probably requires thousands of clock cycles to execute.

IMHO it doesn't make sense to inline op_call because:

1. It's a huge amount of JIT code just to save three of four
instructions at runtime (call, return, and maybe some register
shuffling)

2. The code which is executed is thousands of instructions and saving three or four instructions is a microscopic net win.

4. It make the generated machine code MUCH larger because instead of
having one copy of this function that is written in C/C++ and
statically compiled, there are multiple copies of this code for every
instance of op_call, which makes the instruction cache much less
effective.

5. The generated machine code is weakly optimized, so instead of having
calling code which is well-optimized by the C/C++ compiler for MIPS, it
is executing weakly optimized dynamically generated code. Since the
code is weakly optimized, it is also much larger than it should be,
which also makes the instruction cache much less effective.

6. The JIT-generated code resides in the data cache, and must be
flushed to main memory, then the instruction cache must be invalidated
so the new code will load into the instruction cache. Because the
WebKit JIT seems to do lazy compilation of functions at call time
(instead of compiling all the functions in one pass), this requires the
data cache to be flushed and the instruction cache to be invalided
every time a new function is generated, which further degrades
performance. This type of code generation strategy is ok for processors
with unified caches (or pseudo-ounified on x86) but for RISC machines
with separate instruction and data caches, it's really awful.

This is just one of the problems with the JIT on MIPS (and other RISC processors). If you're interested, I can elaborate more.

If my client is willing to pay for optimization work, I will eventually submit patches.

Toshi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090610/17d29eee/attachment.html>