[Webkit-unassigned] [Bug 200863] Crash in JSC::SlotVisitor::visitChildren

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 15 05:14:45 PST 2022


https://bugs.webkit.org/show_bug.cgi?id=200863

--- Comment #15 from Krzysztof Konopko <kris at youview.com> ---
(In reply to Mark Lam from comment #13)
> Tips for debugging a GC related crash (like this one):
> 

Thanks!  Very much appreciated!

> 1. Does it reproduce with JSC_useGenerationalGC=0?

Doesn't seem so.

> 2. Does it reproduce with JSC_useConcurrentGC=0?
>

Less likely.  Initially we had this as a work-around and believed this alleviates the issue, until I recently a crash despite `JSC_useConcurrentGC=0` being set but that was with a full-blown web app.  See this comment:  https://bugs.webkit.org/show_bug.cgi?id=200863#c7

>    These test if you have some sort of missing write barrier issue.
> 

Yup, there seems to be an issue with a barrier.

> 3. Does running with JSC_useZombieMode=1 make it reproduce more easily?
> 
>    Rules out incremental sweeping as a factor.
>    Plus, helps make GC issues manifest sooner, though it may perturb the
> timing of the run and hide the issue.
> 

The reproducibility seems to be the same, ie. it's stil quite easy to reproduce the crash on "custom AArch64 platform" with the attached example and additionall logging patch.

> 4. Does it reproduce with a Debug build?
> 
>    Helps makes things easier to debug.
>    Plus enable a lot more assertions to check invariants.
> 

Yes, although it's more difficult to reproduce, and haven't managed to reproduce it with the simplified example attached.  It was reproducible with a bigger web app though and many other things going on.  The crash looked the same.

I do reproduce it with a release build with debug symbols though using the attached example.  Can try again a debug build.

> 5. Does running with JSC_verifyGC=1 report any errors?
> 
>    Helps catch potential concurrent GC and generational GC issue and point
> to potential where the issue is.
>    Note: though rare, may report a false positive.
> 

Quickly checking it, I don't see any errors, although with the amount of logging enabled I could be missing something.  Will take a closer look.

> Some thoughts on your specific issue:
> 6. This appears to reproduce only on your "custom AArch64 platform".
> 
>    Is this "custom AArch64 platform" stable?

It's supposed to be, but what platform can be considered stable these days?  There's a chance it's a platform issue which I do not rule out.

>    Have you ruled out silicon or OS kernel bugs?
> 

These are always possible, aren't they?

>    From my past experience in the real world (not theoretical), I've known
> new CPUs (from a vendor whom I shall not name but is not Apple) to have
> either silicon or OS kernel configuration bugs that result in concurrency
> issues where the hardware itself does not enforce proper memory coherency
> despite the presence of the needed memory barriers.  Has this been ruled out
> yet?
> 

Not ruled out.  See this comment though:  https://bugs.webkit.org/show_bug.cgi?id=200863#c12

This suggests the crash is not necessarily specific to the "custom AArch64 platform" where I can reproduce it.  The original report was for x86_64.

> 7. If you're running on custom silicon, are you also adding custom code to
> WebKit e.g. new types of Objects that are JSCells, or new functions that
> allocate and manipulate JSCells?
> 

No.

>     If so, are you sure you have issued write barriers in all the needed
> places?
> 

Not applicable.

>     One way to test this is to see if your issue still reproduces with the
> concurrent and generational GC disabled (see (1) and (2) above).
> 

(1) seems not (easily) reproducible, (2) is known to still reproduce the crash although not so easily.

>     If the reproduction stops, the next thing is to turn those back on, and
> start sprinting write barriers liberally in your code to see if it makes the
> issue goes away.
> 
>     If it does, gradually remove this sprinkling of write barriers, and see
> which one re-introduces the crash.  If you've isolated it, then audit the
> code around there to figure out why that write barrier is needed, or not.
> 

Not applicable, we don't have any custom code.  In fact, I ruled this out with a pristine WPE release build for the "custom AArch64 platform" and no patches/modifications.

We do have a custom UI process (launcher) but the crash is in the Web process.

> There are also advanced techniques for debugging GC issues using
> JSC_verifyHeap=1 that requires writing a lot of custom code carefully:
> requires knowing what you are doing with GC related code, and understanding
> the art of bisecting bugs in time (vs in space).  It's not a turn key
> solution for debugging such issues, but if you're the type who can dive in
> and reason deeply about how the system works, you can use this to help
> isolate the issue ... assuming it is a software issue.

OK, will try to explore this avenue, also look into more detail for any reports from `JSC_verifyGC=1`.

And what do you think of Valgrind warnings, as in this comment: https://bugs.webkit.org/show_bug.cgi?id=200863#c8
?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20221115/0f8b6f45/attachment-0001.htm>


More information about the webkit-unassigned mailing list