[webkit-dev] [Block Pointer] Deterministic Region Based Memory Manager
Filip Pizlo
fpizlo at apple.com
Sat Mar 5 21:30:21 PST 2016
Phil,
I would expect our GC to be much faster than shared_ptr. This shouldn’t really be surprising; it’s the expected behavior according to the GC literature. High-level languages avoid the kind of eager reference counting that shared_ptr does because it’s too expensive. I would expect a 2x-5x slow-down if we switched to anything that did reference counting.
You should take a look at our GC, and maybe read some of the major papers about GC. It’s awesome stuff. Here are a few papers that I consider good reading:
Some great ideas about high-throughput GC: http://www.cs.utexas.edu/users/mckinley/papers/mmtk-icse-2004.pdf <http://www.cs.utexas.edu/users/mckinley/papers/mmtk-icse-2004.pdf>
Some great ideas about low-latency GC: http://www.filpizlo.com/papers/pizlo-pldi2010-schism.pdf <http://www.filpizlo.com/papers/pizlo-pldi2010-schism.pdf>
Some great ideas about GC roots: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf <http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf>
A good exploration of the limits of reference counting performance: http://research.microsoft.com/pubs/70475/tr-2007-104.pdf <http://research.microsoft.com/pubs/70475/tr-2007-104.pdf>
Anyway, you can’t ask us to change our code to use your memory manager. You can, however, try to get your memory manager to work in WebKit, and post a patch if you get it working. If that patch is an improvement - in the sense that both you and the reviewers can apply the patch and confirm that it is in fact a progression and doesn’t break anything - then this would be the kind of thing we would accept.
Having looked at your code a bit, I think that you’ll encounter the following problems:
- Your code uses std::mutex for synchronization. std::mutex is quite slow. You should look at WTF::Lock, it’s much better (as in, orders of magnitude better).
- Your code implements lifecycle management that is limited to reference counting. This is not adequate to support JS, DOM, and JIT semantics, which are based on solving arbitrary data flow equations over the reachability set.
- It’s not clear that your allocator results in fast path code that is competitive against either of the JSC GC’s allocators. Both of those require ~5 instructions in the common case. That instruction count includes the oversize object safety checks.
- It’s not clear that your allocator is compatible with JITing and standard JavaScript performance optimizations, which assume that values can be passed around as bits without calling into the runtime. A reference counter needs to do some kinds of memory operations on variable assignments. This is likely to be about a 2x-5x slow-down. I would expect a 2x slow-down if you did non-thread-safe reference counting, and 5x if you made it thread-safe.
-Filip
> On Mar 5, 2016, at 8:05 PM, Phil Bouchard <philippeb8 at gmail.com> wrote:
>
> On 03/05/2016 01:02 AM, Phil Bouchard wrote:
>> On 03/05/2016 12:49 AM, Filip Pizlo wrote:
>>>
>>> If you're right then you've resolved CS problems dating back to the
>>> 50's. Extraordinary claims require extraordinary evidence. You haven't
>>> provided any evidence.
>>
>> It wasn't easy to implement but it's done now so we can all move forward.
>>
>>> Replacing our GC with anything else is going to be incredibly difficult.
>>>
>>> We aren't going to be compelled by a comparison of our GC to something
>>> else if it isn't in the context of web workloads.
>>
>> So you're saying it's impossible? Is there a design document I could
>> start reading?
>>
>>
>> Regards,
>> -Phil
>> (Sorry if I don't reply... it's late)
>
> Believe it or not, I made a mistake in the fast_pool_allocator which allocates proxies. I wasn't using unitary size which was clogging the allocator. I fixed it and now block_ptr<> is *faster* than shared_ptr<>:
>
> make:
> auto_ptr: 25637845 ns
> shared_ptr: 26789342 ns
> block_ptr: 50487376 ns
>
> new:
> auto_ptr: 23139888 ns
> shared_ptr: 48198668 ns
> block_ptr: 39151845 ns
>
> So the performance comparison reduces to this. Now it's just a matter of finding out if block_ptr<> leaks in any way.
>
>
> Regards,
> -Phil
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org <mailto:webkit-dev at lists.webkit.org>
> https://lists.webkit.org/mailman/listinfo/webkit-dev <https://lists.webkit.org/mailman/listinfo/webkit-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-dev/attachments/20160305/61b6c599/attachment.html>
More information about the webkit-dev
mailing list