[webkit-dev] Proposal for Device-Specific Layout Tests

Jonathan Bedard jbedard at apple.com
Thu Dec 13 10:17:41 PST 2018



> On Dec 12, 2018, at 11:28 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> 
> 
> 
>> On Dec 12, 2018, at 2:20 PM, Jonathan Bedard <jbedard at apple.com <mailto:jbedard at apple.com>> wrote:
>> 
>>> 
>>> On Dec 12, 2018, at 11:16 AM, Maciej Stachowiak <mjs at apple.com <mailto:mjs at apple.com>> wrote:
>>> 
>>> 
>>> 
>>>> On Dec 12, 2018, at 10:07 AM, Jonathan Bedard <jbedard at apple.com <mailto:jbedard at apple.com>> wrote:
>>>> 
>>>> Ryosuke and I discussed this on Monday, and in passing, Ryosuke mentioned that he personally finds something like this:
>>>> 
>>>> 	<test-name>.html
>>>> 	<test-name>-expected.txt
>>>> 	<test-name>-expected-<device-type>.txt
>>>> 
>>>> more clear than the directory method I proposed. After implementing the above approach in the patch uploaded to <https://bugs.webkit.org/show_bug.cgi?id=192162 <https://bugs.webkit.org/show_bug.cgi?id=192162>>, I’m inclined to agree. Ryosuke’s approach achieves everything we need for device-type specific expected results.
>>>> 
>>>> This still doesn’t solve disagreements about how to organize test results when a single test is run on multiple device types, but it seems like a step in the right direction.
>>> 
>>> In my opinion, we should think about what kind of device and platform differences we expect, and see if it can be organized into a single model. It strikes me as odd to have two totally different ways to organize variant results. And we don’t necessarily need to consider different platforms to be only targets with different binaries.
>> 
>> I think that we have 4 major reasons for differing expected results on different platforms:
>> 	1) Missing feature in the test harness
>> 	2) Feature differentiation
>> 	3) Bug (or quirk) from the device-type/platform/OS
>> 	4) Tests sensitive to screen size and graphics support (deep color, for example)
>> 
>> #1 and #3 are well covered under our current scheme, and I haven’t seen these types of differences connected to device type. #2 is usually connected to platform, occasionally OS version and in a few notable cases, iPad vs iPhone. #4 is pretty much exclusively tied to device type and is a difference that we have mostly ignored.
>> 
>>> One issue with these flat device names is that they have no hierarchy. It was hard for me to tell if your iPhone 7 vs iPhone 8 example of a difference was real or just imaginary, but I’d expect more tests to be different for iPhone vs iPad than different for iPhone 7 vs iPhone 8, so it would be nice to have a hierarchy for iPhone as a device class with different types of iPhones under it.
>> 
>> The iPhone 7 vs iPhone 8 difference was contrived, but even now we have a set of tests which must be run on an iPhone 7 instead of an iPhone SE because iPhone 7 supports deep color. So there are definitely circumstances where iPhones might have different expected results from other iPhones.
>> 
>> In this proposal, I haven’t detailed the specifics of parsing device types because that code actually already exists for creating simulated devices. At the moment, device types aren’t implemented as a hierarchy, they’re implemented by creating objects which use an “is a” comparison. A log from a simulator test run shows this pretty well <https://build.webkit.org/builders/Apple%20iOS%2012%20Simulator%20Release%20WK2%20%28Tests%29/builds/1491/steps/layout-test/logs/stdio <https://build.webkit.org/builders/Apple%20iOS%2012%20Simulator%20Release%20WK2%20(Tests)/builds/1491/steps/layout-test/logs/stdio>>.
>> 
>> Creating devices for iPhone SE running iOS 12
>> 11:52:45.780 92192 Creating device 'Managed 0', of type iPhone SE running iOS 12
>> Creating devices for iPhone 7 running iOS 12
>> 12:21:30.931 92192 Creating device 'Managed 0', of type iPhone 7 running iOS 12
>> Creating devices for iPad running iOS 12
>> 12:21:49.425 92192 Creating device 'Managed 0', of type iPad (6th generation) running iOS 12
>> While I haven’t given an example of a generic iPhone expected result, all the examples of iPad expected results have actually been that of a generic iPad, ‘iPad (6th generation)’ would we an example of a non-generic iPad. iPhones operate under a similar system, just as an ‘iPad (6th generation)’ uses generic iPad expectations, so too would an ‘iPhone 7’ use generic iPhone expectations, unless a more specific expected result was available. 
> 
> I’m thinking of this from the perspective of someone navigating the test directory manually, rather than the perspective of the bots. Tools can clearly deal with any layout we come up with, so it should be optimized first for human use.
> 
>> 
>>> 
>>> But of course, while a device hierarchy could be fit under a flat notion of OS, the trick is how to fit it with a sequence of OS versions. Using directories for OS versions but filename variations for devices classes seems weird to me, and maybe almost backwards. I’d expect many device class differences to be permanent (iPhone vs iPad for example), while OS version differences may be transitory, in that they are often quirks of an older OS that will not matter once we no longer support that OS.
>>> 
>>> I think it’s worth thinking through all the variations that would be needed for a few real tests (ideally ones that already vary by OS version but which would also vary by device type) and make a single model that makes sense.
>> 
>> I agree with the assessment of device types difference being more permanent while OS version difference are transitory. That’s why we’ve used the hierarchy for OS version, it allows us to collect transitory expectations together in a few directories. I don’t think we want to pile on our existing hierarchy for iOS, this is what it looks like:
> 
>> 
>> 	platform/ios-simulator-12-wk2
>> 	platform/iso-simulator-12
>> 	platform/ios-simulator-wk2
>> 	platform/ios-simulator
>> 	platform/ios-12
>> 	platform/ios-wk2
>> 	platform/ios
>> 	platform/wk2
> 
> OS versions are transitory, but OSes likely are not. Ditto for WebKit models (legacy vs modern). And simulator vs non-simulator
> 
> And I guess this highlights another dimension of variation, modern webkit vs. legacy webkit. Even with all these explanations, I don’t get why  we would handle variation by platform, OS version and webkit variant one way, and device type a totally different way. It just seems random.
> 
> I guess “delete whole directory at once” is one concrete argument, but it doesn’t really align with the difference here. We might delete “ios-12”, but we are unlikely to delete “ios” results any time soon.
> 
> We’re unlikely to delete “ipad” or “iphone” results any time soon, but if we had “iphone-SE” results then they would age out eventually. Modern WebKit vs Legacy WebKit is a difference likely to persist for a while, but if it was removed, we’d want to delete the Legacy WebKit results (which are not in a special directory) rather than the Modern WebKit results (which are).
> 
> 
> 
>> 
>> I also don’t think that device type really fits the hierarchy model. Conceptually, device type is a parallel idea to OS version. In most cases, we would expect the results for a given iPad test to be the same for both iOS Simulator and iOS Device across all versions. It seems unwise to force the concept of device type and OS version into the same idiom.
> 
> Device types clearly have their own hierarchy, which doesn’t align with the OS version inheritance we’ve created, or with the side features like “simulator” or “wk2” that we’ve appended.
> 
>> 
>>> 
>>> Maybe we should just use filename variations for everything, since that naturally expresses independent variation along multiple dimensions, while directories can only represent a single hierarchy. The trick then would be figuring out the priority order. If I have <test-name>-expected-ios-ipad.txt and <test-name>-expected-ios12.txt, then which is the right one to use on an iOS 12 iPad? Maybe we could have a convention to make ambiguous variation like this an error, or else decide whether OS version or device should take priority.
>> 
>> I think appending OS version to expected results will greatly complicate gardening test results. It’s quite useful to be able to move around entire directories which correspond to OS version. Since OS version is always tied to platform,  I think both of these need to remain directories as they are now.
> 
> Explain to me why ios, ios-12, ios-simulator and ios-wk2 are all directories that often need to be moved, but ios-ipad would not be.  I’m not seeing the category difference here. We have a number of conditions that can affect test results, I don’t see how device is categorically different from the various other conditions.

A tangential comment, ios, ios-simulator or ios-wk2 will not need to be moved regularly. It is worth noting that in the relatively recent past, ios-simulator was actually renamed ios and a new ios-simulator directory was created, this was to support on-device testing. It’s conceivable that we would want to do something similar to ios (or at least parts of ios) for ios-mac or watchos.

I think that we have 3 big reasons that device type is different from our other conditional test results.

The first is related to the ‘platforms map to binaries’ bit. A foundational assumption I’m operating on, based on discussion with contributors, is that running ‘run-webkit-tests —ios-simulator’ should encompass all iOS device types. If we used the current scheme to do this, every run of ‘run-webkit-tests —ios-simulator’ would contain multiple different hierarchies for expected results, this would end up looking a bit like running WebKitLegacy layout tests alongside WebKit layout tests. Given that contributors already have a tough time parsing our hierarchies, having to do so multiple times in a single test run seems very bad.

The second reason is that placing device types in with the rest of our expectations will either be extremely verbose or break our existing inheritance logic.  To explain this, I’m going to give a few examples. The simplest way to embed device type would be something like this:

	platform/ios-simulator-12-wk2
	platform/ios-simulator-wk2
	platform/ios-simulator
	platform/ios-12
	platform/ios-ipad
	platform/ios-wk2
	platform/ios
	platform/wk2

This is the example that could break our existing inheritance logic, because we don’t have a way to handle a bug that is specific to an iPad running iOS 12, for example. To support that case (and related problems, such as a bug specific to iPad on simulator), we would need something like this:

	platform/ios-simulator-12-wk2-ipad
	platform/ios-simulator-12-wk2
	platform/ios-simulator-wk2-ipad
	platform/ios-simulator-wk2
	platform/ios-simulator-ipad
	platform/ios-simulator
	platform/ios-12-ipad
	platform/ios-12
	platform/ios-ipad
	platform/ios-wk2
	platform/ios
	platform/wk2

Actually having these directories isn’t awful, but forcing contributors to reason about this hierarchy every time they run layout tests for iOS seems particularly bad.

The final reason is simple: scale. For reference, we have ~900 expectation in platform/ios and ~1500 in platform/mac. So far, I’ve found less than 100 tests (out of our 50,000) that care about device type. This problem is niche enough that I don’t think most contributors will need to reason about device differences when writing and running layout tests. Adding device type to the inheritance hierarchy forces contributors to reason about device type.

I think the real question here is do we agree on the assumption running ‘run-webkit-tests —ios-simulator’ should encompass all iOS device types. If the answer to this question is ‘no’, then we can just deal with the ugliness of #2 for whichever devices we care about. If, however, we want ‘run-webkit-tests —ios-simulator’ (and ‘run-webkit-tests —ios-device’, although it is far less important) to support any iOS device type, we need a way to reason about different expected results within a single instantiation of run-webkit-tests. I am of the opinion that we will regret forcing the caller of run-webkit-tests to think about which iOS device they care about.

Thanks,
Jonathan

> 
> Regards,
> Maciej

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20181213/e14157c9/attachment.html>


More information about the webkit-dev mailing list