[Webkit-unassigned] [Bug 161581] New: Speed up GC by using indexed collections to track MarkedBlock::Handles and WeakSets and using bitvectors to track their states

Sun Sep 4 18:59:12 PDT 2016

https://bugs.webkit.org/show_bug.cgi?id=161581

            Bug ID: 161581
           Summary: Speed up GC by using indexed collections to track
                    MarkedBlock::Handles and WeakSets and using bitvectors
                    to track their states
    Classification: Unclassified
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: All
                OS: All
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: JavaScriptCore
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: fpizlo at apple.com

For example, each MarkedAllocator could have a Vector<std::unique_ptr<MarkedBlock::Handle>> m_blocks.  Then it could have the following sets represented by bitvectors indexed by the handle's index in m_blocks:
- m_canAllocate, which contains all non-empty non-retired blocks.
- m_snapshot, which contains all blocks in the block snapshot.
- m_empty, which contains all empty blocks.
- m_eden, which contains all blocks that have new objects.
- m_allocated, which contains all blocks that have been completely filled up.

If we do this, then we can get rid of block state, the m_blocksWithNewObjects vector, and the block snapshot vector.  The following operations become cheap operations over bitvectors:

- All of the flips:
    - Full GC flip: m_empty = 0, m_canAllocate = 0, m_allocated = 0.
    - Eden GC flip: m_canAllocate |= m_eden, m_allocated = 0.
    - Full GC snapshot: m_snapshot = 1.
    - Eden GC snapshot: m_snapshot |= m_eden.

- All of the iterations:
    - Allocate: scan m_canAllocate, then scan m_empty, then scan m_empty of other size classes (so we can rebalance the heap cheaply!).
    - Incremental sweep: scan (m_snapshot & ~m_allocated).

We could allocate WeakSets outside MarkedBlocks, and have them also be tracked in their own Vector<std::unique_ptr<WeakSet>>, owned by MarkedSpace, and also track various sets of them using bitvectors.

This ought to reduce any per-block overheads by so much that they should appear invisible.  We did lose some perf by going from 64KB to 16KB even with many optimizations, so this may be enough to bring us back to where we were before.  For example, consider what happens when we are using 200MB.  That means having to sometimes consider about 13000 blocks, which the current code does by looping over that many things, and those things are unlikely to reside on the same cache line.  But with bitvectors, the typical flipping operation would only have to touch 13000 / 8 = 1625 bytes of contiguous bits.  Using 64-bit operations, that means only having to loop over about 200 words.  That means that the previous flipping operations that would take about 13000 iterations just to touch each block will now only require 200 iterations - as in we actually have a chance of speeding up those operations by 65x.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20160905/9db0b6fc/attachment.html>