[Webkit-unassigned] [Bug 220794] New: run-jsc-stress-tests doesn't handle dead remotes in detectFailures
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Thu Jan 21 06:36:47 PST 2021
https://bugs.webkit.org/show_bug.cgi?id=220794
Bug ID: 220794
Summary: run-jsc-stress-tests doesn't handle dead remotes in
detectFailures
Product: WebKit
Version: WebKit Nightly Build
Hardware: Unspecified
OS: Unspecified
Status: NEW
Severity: Normal
Priority: P2
Component: JavaScriptCore
Assignee: webkit-unassigned at lists.webkit.org
Reporter: angelos at igalia.com
When a remote board goes away while run-jsc-stress tests is running, the --gnu-parallel-runner reschedules the tests properly, but detectFailures can fail in a number of ways:
- if the board is down when detectFailures runs, it'll fail the whole test run after getting a connection error
- if the board has come up again, there's no guarantee that the failure files are still there. In fact, the mips boards will recreate the R/W filesystem if fsck detects any errors on boot, which means that all the machinery in the remoteDirectory isn't there anymore.
One way to handle this case would be to also restart jobs for which we weren't able to get the PASS/FAIL status. Perhaps by including the fetch in the command invocation, so that GNU parallel will transparently handle this for us -- guess this means we need to move away from detectFailures on --gnu-parallel-runner.
Note that detectFailures is fundamentally flawed in any case: it should be actively confirming that the job finished successfully, not relying on the absence of a 'failure' file.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20210121/9f6b5e2e/attachment.htm>
More information about the webkit-unassigned
mailing list