[Webkit-unassigned] [Bug 238256] New: EWS queue gets stuck when patch causes tests to hang

Wed Mar 23 06:32:16 PDT 2022

https://bugs.webkit.org/show_bug.cgi?id=238256

            Bug ID: 238256
           Summary: EWS queue gets stuck when patch causes tests to hang
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: Tools / Tests
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: angelos at igalia.com

Starting with https://ews-build.webkit.org/#/builders/46/builds/21295, patch https://bugs.webkit.org/attachment.cgi?id=455076&action=prettypatch apparently caused enough of the JSC tests to hang that make(1) was stuck waiting for its jobs to finish. Since no process was producing any output any more, this resulted in the whole thing getting killed by buildbot:

command timed out: 1200 seconds without output running ['perl', 'Tools/Scripts/run-javascriptcore-tests', '--no-build', '--no-fail-fast', '--json-output=jsc_results.json', '--release', '--memory-limited', '--verbose', '--jsc-only', '--treat-failing-as-flaky=0.6,10,200'], attempting to kill

Even worse, this resulted in a RETRY (which failed in the same way, resulted in another RETRY and so on), causing the patch to get tested over and over for days.

Ideas:

1. Only handle this specific issue via means of `/usr/bin/timeout` for each test so that make doesn't get stuck. Clearly, that only addresses this specific cause, not the failure mode.

2. Somehow keep state and only allow a limited number of retries (perhaps just one?). If the tests without the patch consistently return results but the ones with the patch don't, then it's a good guess that this is not a transient infrastructure issue but a problem caused by the patch itself. The above patch would be an example of that, but any patch making changes to `run-jsc-stress-tests` could result in such a failure mode. That said, I've skimmed the docs and it doesn't look like buildbot offers a simple way to keep state between builds.

3. Have an explicit (and cheap!) checkThatTheInfrastructureWorks step and use it judiciously. If a test fails without even producing results, run checkThatTheInfrastructureWorks. If it fails, RETRY. If not, declare the patch a failure.

Can come up with more schemes but those seem like the cheapest ones.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20220323/fdb5e51f/attachment-0001.htm>