[webkit-dev] Another WPT bite

Fri May 12 14:49:53 PDT 2017

> On May 12, 2017, at 2:39 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> 
> On Fri, May 12, 2017 at 12:04 PM, Alexey Proskuryakov <ap at webkit.org> wrote:
>> 
>> 12 мая 2017 г., в 11:52, Ben Kelly <ben at wanderview.com> написал(а):
>> 
>> On Fri, May 12, 2017 at 2:26 PM, Rick Byers <rbyers at chromium.org> wrote:
>>> 
>>> On Fri, May 12, 2017 at 2:06 PM, Alexey Proskuryakov <ap at webkit.org>
>>> wrote:
>>>> 
>>>> Since imported WPT tests are very flaky, and are not necessarily written
>>>> to defend against important regressions, investigating issues with them is
>>>> relatively lower priority than investigating issues observed with WebKit
>>>> tests. So I would recommend not mixing tests for WebKit regressions with WPT
>>>> tests - if your test eventually ends up in LayoutTests/imported, it will
>>>> become a lot less effective.
>>> 
>>> 
>>> FWIW this is absolutely NOT how we're treating this in chromium.  If this
>>> is how things end up in practice then we will have failed massively in this
>>> effort.
>>> 
>>> We figure if we want the web to behave consistently, we really have no
>>> choice but to treat web-platform-tests as first class with all the
>>> discipline we give to our own tests.  As such we are actively moving many of
>>> our LayoutTests to web-platform-tests and depending entirely on the
>>> regression prevention they provide us from there.  Obviously there will be
>>> hiccups, but because our product quality will depend on web-platform-tests
>>> being an effective and non-flaky testsuite (and because we're starting to
>>> require most new features have web-platform-tests before they ship), I'm
>>> confident that we've got the incentives in place to lead to constant
>>> ratcheting up the engineering discipline and quality of the test suite.
>> 
>> 
>> FWIW, mozilla also treats WPT as first class tests.  We're not actively
>> moving old tests to WPT like google, but all new tests (at least in DOM) are
>> being written in WPT.  Of course, we do have exceptions for some tests that
>> require gecko-specific features (forcing GC, etc).
>> 
>> 
>> We don't have a concept of "first class", but I hope that when choosing
>> between looking into a simple test that was added for a known important bug,
>> and looking into an imported test whose importance is unclear, any WebKit
>> engineer will pick the former. And since no one can fix all the things, such
>> prioritization makes imported tests less effective.
> 
> This is absolutely not how I operate at all. Since almost all custom
> elements and shadow DOM API tests I wrote are written using
> testharness.js and upstreamed to web-platform-tests, they have been
> deleted from LayoutTests/fast/shadow-dom &
> LayoutTests/fast/custom-elements.
> 
> As such, if any new shadow DOM or custom elements tests under
> LayoutTests/imported/ start to fail, then we must fix them since we
> idon'thave any other test coverage.

Our normal approach to imported conformance test suites is to treat them as seriously as any other test. Even when a test was originally written along with a specific bug report, we don't necessarily think about that when prioritizing its importance, since it often fails in a way that does not make clear whether the exact origins bug is back.

It seems like there's two unusual things about WPT:
- We pull from upstream more often, and upstream is evolving at a good pace. So it's more of a moving target than something like the old W3C DOM test suite.
- At least according to Alexey, WPT tests are somewhat prone to flakiness in Safari.

It seems like the first issue is something we need to adapt to, to get the best value from WPT. But the second issue is something that has to be resolved within WPT (perhaps with our help). We can't have a lot of our testing depend on a flaky process.

This is separate from the issues about ease of running those tests. On that, I agree with Sam:

> On May 12, 2017, at 2:38 PM, Sam Weinig <weinig at apple.com> wrote:
> 
> I regret piling on here, as I think this thread has diverged from it’s original purpose, but…I understand this frustration. That said, perhaps this is something we can solve with some tooling. For instance, a run-test-in-safari (as a parallel to run-safari) script could be made which starts the server, and then loads the test with the right URL in your built Safari (or MiniBrowser, or whatever).  

I think the pain can be reduced with tooling. The right tools might need to be a bit more subtle. You might want to reload the test repeatedly in the same Safari instance, or perhaps load it into already-running Safari. So maybe load-test-in-safari (that ensures the server is running or launches it, then loads the right URL for a test, and maybe even does the right thing for http, web sockets or plain-file tests) would be closer to the mark. It may still be a little inconvenient but it seems like we can make it significantly better.

Regards,
Maciej