[Webkit-unassigned] [Bug 200863] Crash in JSC::SlotVisitor::visitChildren

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 15 08:19:04 PST 2022


https://bugs.webkit.org/show_bug.cgi?id=200863

--- Comment #18 from Mark Lam <mark.lam at apple.com> ---
(In reply to Krzysztof Konopko from comment #15)
> (In reply to Mark Lam from comment #13)
> > Tips for debugging a GC related crash (like this one):
> > 
> 
> Thanks!  Very much appreciated!
> 
> > 1. Does it reproduce with JSC_useGenerationalGC=0?
> 
> Doesn't seem so.

This is interesting if this continues to be true.

> > 2. Does it reproduce with JSC_useConcurrentGC=0?
> 
> See this comment:  https://bugs.webkit.org/show_bug.cgi?id=200863#c7

I saw, but I had to add this case too for the benefit of anyone else seeking to learn about GC debugging by reading these comments.

> Yup, there seems to be an issue with a barrier.

I think this is a good likely scenario ... or something that has the effect of a missing barrier.


> > 3. Does running with JSC_useZombieMode=1 make it reproduce more easily?
> > 
> >    Rules out incremental sweeping as a factor.
> >    Plus, helps make GC issues manifest sooner, though it may perturb the
> > timing of the run and hide the issue.
> > 
> 
> The reproducibility seems to be the same, ie. it's stil quite easy to
> reproduce the crash on "custom AArch64 platform" with the attached example
> and additionall logging patch.

If the reproducibility is the same, then always run with useZombieMode=1 while you're still debugging this.  It can only help.

> > 4. Does it reproduce with a Debug build?
> > 
> >    Helps makes things easier to debug.
> >    Plus enable a lot more assertions to check invariants.
> > 
> 
> Yes, although it's more difficult to reproduce, and haven't managed to
> reproduce it with the simplified example attached.  It was reproducible with
> a bigger web app though and many other things going on.  The crash looked
> the same.
> 
> I do reproduce it with a release build with debug symbols though using the
> attached example.  Can try again a debug build.

A Release build with debug symbols does not add new info.
A Debug build with optimizations forced to -O3 will add info.

On Mac builds, it's easy to force the build to use -O3 (see `set-webkit-configuration --force-opt=O3`).  However, that mechanism to force O3 only works with Xcode builds.  You should look into doing the same with your own build system and see it reproduces with the Debug build.

If speed (and therefore timing) is the reason it stops reproducing, then forcing O3 should make it easy to reproduce again.

Using the Debug build is interesting because as I said earlier, it will "enable a lot more assertions to check invariants".  This help catch the bug earlier.

> > 5. Does running with JSC_verifyGC=1 report any errors?
> > 
> >    Helps catch potential concurrent GC and generational GC issue and point
> > to potential where the issue is.
> >    Note: though rare, may report a false positive.
> > 
> 
> Quickly checking it, I don't see any errors, although with the amount of
> logging enabled I could be missing something.  Will take a closer look.

You can run with JSC_verifyGC=1 on Release builds too.


> > Some thoughts on your specific issue:
> > 6. This appears to reproduce only on your "custom AArch64 platform".
> > 
> >    Is this "custom AArch64 platform" stable?
> 
> Not ruled out.  See this comment though: 
> https://bugs.webkit.org/show_bug.cgi?id=200863#c12

If your crash the same as #c12?  There may still be a source of crashes in the code base, but this manifests rarely.  #c12 can be one of those.  In your case, are you just seeing the crash once in a blue moon (to the extent that it's not reproducible on demand)?  Or can you always reproduce it simply by running some workload for some determined period of time?  It sounded like your scenario is the latter, which implies that you have a different bug here.

Also see https://bugs.webkit.org/show_bug.cgi?id=200863#c17 regarding possible root causes of this crash.

The question to ask is: are you also seeing other types of inexplicable crashes in other parts of the system at about the same rate as the SlotVisitor::visitChildren crash.  The answer to this tells you whether there is something else at play in the core platform below WebKit.

> > There are also advanced techniques for debugging GC issues using
> > JSC_verifyHeap=1 that requires writing a lot of custom code carefully:
> > requires knowing what you are doing with GC related code, and understanding
> > the art of bisecting bugs in time (vs in space).  It's not a turn key
> > solution for debugging such issues, but if you're the type who can dive in
> > and reason deeply about how the system works, you can use this to help
> > isolate the issue ... assuming it is a software issue.
> 
> OK, will try to explore this avenue, also look into more detail for any
> reports from `JSC_verifyGC=1`.
> 
> And what do you think of Valgrind warnings, as in this comment:
> https://bugs.webkit.org/show_bug.cgi?id=200863#c8
> ?

As Justin explained in https://webkit.slack.com/archives/CU5LWFM28/p1668090516378259?thread_ts=1668089239.370399&cid=CU5LWFM28, Valgrind makes assumptions about how the code works.  Valgrind is not knowledgeable about how JSC works, and JSC does a lot of advance and tricky algorithms that Valgrind has no way to know about.  As a result, you'll just be looking at a lot (possibly, all) false positives.  If you want to continue to use it, you're on your own on deciphering whether the reported error is real or one of the many false positives that Valgrind will report.

Right now, your data suggests that there may be a missing write barrier.  The other possibility is that you have a compiler bug.  Are you using gcc?  Does replacing it with clang change the rate of reproduction significantly?

Anyway, I've suggested / implied action items above to follow up on.  You can also try using JSC_verifyHeap=1 if you like, but doing so requires that it is able to detect memory corruption due to the bug.  You can always just try running with JSC_verifyHeap=1 as is and see if the current default configuration will already report an error (in the form of a crash).  FYI, JSC_verifyGC=1 also reports its error with a crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20221115/36ba9a76/attachment-0001.htm>


More information about the webkit-unassigned mailing list