[jsc-dev] bmalloc freelist from JIT code

Thu Nov 2 13:58:00 PDT 2017

> On Nov 2, 2017, at 1:32 PM, Yusuke SUZUKI <utatane.tea at gmail.com> wrote:
> 
> 
> On Fri, Nov 3, 2017 at 5:18 Filip Pizlo <fpizlo at apple.com <mailto:fpizlo at apple.com>> wrote:
> 
>> On Nov 2, 2017, at 1:10 PM, Yusuke SUZUKI <utatane.tea at gmail.com <mailto:utatane.tea at gmail.com>> wrote:
>> 
>> Thanks1
>> 
>> On Fri, Nov 3, 2017 at 5:01 AM, Geoffrey Garen <ggaren at apple.com <mailto:ggaren at apple.com>> wrote:
>>> On Fri, Nov 3, 2017 at 4:49 AM, Filip Pizlo <fpizlo at apple.com <mailto:fpizlo at apple.com>> wrote:
>>> I think he means: expose whatever the bmalloc fast path is to the JIT.  I didn’t take it to mean literally add a free list.
>>> 
>>> Yeah, right. My question is: is that feasible to add an interface to allocate bmalloc memory from JIT code?
>>> Is there any known concern about this direction?
>> 
>> I see.
>> 
>> I tend to agree with Phil that inlining bmalloc allocation is possible but it is better to avoid doing so unless we meet a high burden of proof.
>> 
>> FWIW, in MallocBench benchmarks I haven’t seen a significant speedup from inlining the allocation path — even though doing so allows you to constant-fold the allocator selection math.
>> 
>> Great,
>>  
>> 
>> I guess you could start by just inlining the Slice and Substring operations, and then seen how much additional speedup you got by inlining the allocation too.
>> 
>> Geoff
>> 
>> Phil's idea, allocating StringImpl part in JSString sounds really nice as a future direction.
>> I think handling JSString specially in our GC is good.
>> JSString is a primitive value in JS. This has super nice property since it cannot create a reference to the other JSObject.
>> It may have a reference to the other JSString. By structuring JSString & JSRopeString carefully, we can ensure there are no cyclic references in JSStrings.
> 
> I don’t think that’s super useful.
> 
>> Maybe, in this situation, respecting reference counting of JSString's StringImpl would be not so difficult.
>> I don't have a detailed algorithm right now, but this direction sounds the best.
> 
> 
> We do not want to make JSString strictly reference counted.  JSString would be *both* reference-counted and traced.  When the GC runs, it would check if any strings had non-zero reference counts.  If so, it will mark them.  Thus, JSString would stay either if it is referenced from JS or if is referenced from C++.  The way to make this scalable is to give MarkedBlock an extra bitvector that tells which objects ever had a non-zero ref count.  JSString::ref() would set the bit if it’s being called for the first time.
> 
> In this world, it’s JSRopeString would refer to other JSStrings using the GC style: no ref counting, just a visitChildren method.  If the JSRopeString is ref()’d from C++, the GC will mark it, causing its visitChildren method to mark any strings it references.
> 
> Yeah. It’s logically (not in terms of perf) the same to converting Ref<StringImpl> to Strong<JSString>, correct?
> At this time, the important thing is that we should not create a cyclic ref like
> 
> JSWrapper -> Wrapped -> Strong<JSString> -> ... -> JSWrapper
> 
> But the above cyclic refs are never shown since JSString is a primitive. It does not refer to the other JS objects ;)

Right!

Yes, logically it means that Ref<T> becomes Strong<T>, provided that T’s MarkedBlock can handle the reference count bits.  I think that we could just make it a default feature of MarkedBlock.  This can scale because of how MarkedBlock state is encoded in bitvectors in MarkedAllocator.  I would add a m_hasRefCounts bitvector to MarkedAllocator, and set the bit if a MarkedBlock ever allocates the hasRefCounts bitvector for its objects, which only happens on first JSCell::ref() to any cell in that MarkedBlock.  Then you can just have a GC MarkingConstraint that loops over all unmarked objects’ ref count by searching for all blocks that have ref counts and then searching the hasRefCount bits to find just those objects that have ref counts.  Searching bitvectors of all zeroes is super fast, since a single cycle of the loop (loading and testing 32 bits) can skip over 32 * 16,384 = 524,288 bytes of MarkedBlock if ref counts are not in use.

-Filip

> 
> 
> -Filip
> 
> 
> -- 
> Regards,
> Yusuke Suzuki

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/jsc-dev/attachments/20171102/fb02d261/attachment-0001.html>