[webkit-dev] reclaim of number cells

Maciej Stachowiak mjs at apple.com
Thu Jun 26 11:57:01 PDT 2008


On Jun 26, 2008, at 5:20 AM, Zoltan Herczeg wrote:

> Hi Maciej
>
> Thank you for your reply.
> The algorithm itself is very simple. Think about the following  
> expression:
> a = b + c + d. The result of (b+c) is just a temporay value, which  
> is not
> used anymore after this expression is evaluated. From technical  
> viewpoint,
> a shadow register file tracks those numbers cells, which reference  
> counter
> is exatly one. If a reference counter increases (caused by  
> instructions
> like mov, put_by_id, call...), we remove the particular number cell  
> from
> this shadow register file. However, if a new value is stored in a
> register, and a number cell is pointed by its shadow register, we can
> safely reclaim that number cell. The perfromance drawback is that we
> always perfrom this check after some selected instructions (like  
> add, sub,
> mul, div, mod, ...) even if  the current benchmark never allocates a
> number cell.

Is it possible to skip the check in the fast path for immediate  
numbers that these instructions have? That would probably remove the  
drawback.

>
> Your idea of 64 bit VM registers are interesting. That makes this  
> patch a
> complete waste. Since most of our systems are 64 bit systems, we  
> already
> have 64 bit VM registers :) That makes me wondering, how do you plan  
> to
> store a 64 bit pointer using a 64 bit VM register?

The only pointers that have to be stored will be into the JavaScript  
heap, and we can use VM allocation tricks to ensure that it is limited  
to 48 bits or so of real address information in the 64-bit case.  
(Collector memory already uses low-level mechanisms to allocate  
directly from the VM subsystem.)

  - Maciej

>
>
> Cheers,
> zoltan
>
>> Hi Zoltan,
>>
>> The performance results are certainly interesting. Would you mind
>> explaining the algorithm a bit more?
>>
>> Also, do you know why it is slowing down the tests that it does? In
>> general the tests that seem hurt in performance are ones that would
>> use a lot of immediate numbers. I can understand why they would not
>> benefit from reducing allocation of number cells, but why do they get
>> slower? If we understand that, we might be able to come up with a
>> version that has the upsides but not the downsides of your patch.
>>
>> I should also note, we are considering using 64-bit VM registers  
>> (with
>> a special encoding that well let us store a double, a pointer or an
>> int without a separate type tag, basically all non-double types would
>> look like a NaN but not the standard NaN, so the low 48 or so bits  
>> are
>> free.) In such a scheme, number cells would be GC-allocated only if
>> the number is stored in an object (or possibly not at all if we  
>> decide
>> we can make objects and arrays bigger).
>>
>> That's probably a few weeks off though, so I'd be interested in
>> whether we can make your patch work.
>>
>> Regards,
>> Maciej
>>
>> On Jun 26, 2008, at 2:38 AM, Zoltan Herczeg wrote:
>>
>>> Hi All,
>>>
>>> I am Zoltan Herczeg, working for the Dept. of Soft. Eng., Univ. of
>>> Szeged.
>>> I am currently investigating the JavaScriptCore engine to find some
>>> speed-up solutions. During my investigation, I have encountered an
>>> interesting issue for the Garbage Collector. Namely, in the case of
>>> some
>>> sunspider benchmark programs, the GC is frequently called. For
>>> example,
>>> the runtime of 3d-cube is 121 msec, and it calls the slow Collect()
>>> function about 170 times. Thus, the garbage collecting takes the  
>>> large
>>> portion of its runtime (~30%). Such benchmarks cunsume the available
>>> Number Cells in a short time, so the GC has a lot of work to reclaim
>>> these
>>> cells. My idea is to have a simple algorithm, which can reclaim many
>>> of
>>> these cells without the help of the GC. I have attached an
>>> experimental
>>> patch, and the performance-results as well. As you can see, it is a
>>> rather
>>> selective patch. Some programs benefit from it (25% performance
>>> improve
>>> for 3d-cube), while some do not (like controlflow-recursive, which
>>> does
>>> not allocate any number cells at all). Perhaps this approach could
>>> be a
>>> custom-speedup, which should be turned on, when a treshold value is
>>> reached. (Number of collects() in a given time is greater than a
>>> pre-defined value.) However, the mainloop (Machine:: PrivateExecute)
>>> should be duplicated in this case, which is not an easy task because
>>> of
>>> the computed gotos. Do you have any other idea to make this patch
>>> useful?
>>>
>>> Regards,
>>> Zoltan<gc-
>>> improve
>>> -34737
>>> .patch><compare.txt>_______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



More information about the webkit-dev mailing list