[webkit-dev] Enable REQUEST_ANIMATION_FRAME on all ports? (was Re: ENABLE flag cleanup strawman proposal)

Tue Sep 27 10:34:09 PDT 2011

On Sep 26, 2011, at 9:48 PM, James Robinson wrote:

> 
> 
> On Sun, Sep 25, 2011 at 6:52 PM, Darin Adler <darin at apple.com> wrote:
> On Sep 25, 2011, at 12:20 AM, James Robinson wrote:
> 
> > The TIMER based support for RAF is very new (only a few weeks old) and still has several major bugs. I'd suggest letting it bake for a bit before considering turning it on for all ports.
> 
> Got it.
> 
> > Fundamentally I don't think this feature can be implemented reasonably well with just timers, so port maintainers should take a really careful look at the level of support they want to have for this feature when deciding if they want to support it.
> 
> This may contradict the recommendation above. If the timer-based version is too low quality then maybe we shouldn’t put ports in the position of shipping with a substandard implementation rather than simply having the feature omitted.
> 
> Perhaps if I expand on my concerns a bit it'll be clearer what the right option is.
> 
> The goal of requestAnimationFrame is to allow web authors to have high-quality script-driven animations.  To use a concrete example, when playing angry birds (http://chrome.angrybirds.com/) and flinging a bird across the terrain, the RAF-based animation should move the bird at a uniform rate across the screen at the same framerate as the physical display without hitches or interruptions.  An additional goal is that we shouldn't do any unnecessary work for frames that do not show up on screen, although it's generally necessary to do this in order to satisfy the first goal as I'll show below.  There are two main things that you need in order to achieve this that are difficult or impossible to do with a WebCore Timer: a reliable display-rate aligned time source, and a source of feedback from the underlying display mechanism.
> 
> The first is easiest to think about with an example.  When the angry bird mentioned above is flying across the screen, the user should experience the bird advancing by the same amount every time their display's update refreshes.  Let's assume a 60Hz display and a 15ms timer (as the current REQUEST_ANIMATION_FRAME_TIMER code uses), and furthermore assume (somewhat optimistically) that every frame takes 0ms to process in javascript and 0ms to display.  The screen will update at the following times (in milliseconds): 0, 16 2/3, 33 1/3, 50, 66 2/3, 83 1/3, 100, etc.  The visual X position of the bird on the display is directly proportional to the time elapsed when the rAF handler runs, since it's interpolating the bird's position, and the rAF handler will run at times 0, 15, 30, 45, 60, etc.  We can thus determine the visual X position of the bird for each frame:
> 
> Frame 0, time 0ms, position: 0, delta from last frame:
> Frame 1, time 16 2/3ms, position: 15, delta from last frame: 15
> Frame 2, time 33 1/3ms, position: 30, delta from last frame: 15
> Frame 3, time 50 0/3 ms, position: 45, delta from last frame: 15
> Frame 4, time 66 2/3 ms, position: 60, delta from last frame: 15
> Frame 5, time 83 1/3 ms, position: 75, delta from last frame: 15
> Frame 6, time 100 0/0 ms, position: 90, delta from last frame: 15
> Frame 7, time 116 2/3ms, position: 105, delta from last frame: 15
> Frame 8, time 133 1/3ms, position: 120, delta from last frame: 15
> Frame 9, time 150 0/3 ms, position: 150, delta from last frame: 30 (!)
> Frame 10, time 166 2/3 ms, position: 165, delta from last frame: 15
> Frame 11, time 183 1/3 ms, position: 180, delta from last frame: 15
> Frame 12, time 200 0/0 ms, position: 195, delta from last frame: 15
> 
> What happened at frame 9?  Instead of advancing by 15 milliseconds worth, the bird jumped forward by twice the normal amount.  Why?  We ran the rAF callback twice between frames 8 and 9 - once at 135ms and once at 150ms.  What's actually going on here is we're accumulating a small amount of drift on every frame (1.66666... milliseconds, to be precision) between when the display is refreshing and when the callbacks are being invoked.  This has to catch up sometime so we end up with a beat pattern every (16 2/3) / abs(16 2/3 - 15) = 10 frames.  The same thing happens with a perfect 16ms timer every 25 frames, or with a perfect 17ms timer every 50 frames.  Even a very close timer will produce these regular beat patterns and as it turns out the human eye is incredibly good at picking out and getting annoyed by these effects in an otherwise smooth animation.

I generally agree with your analysis, but I believe your example is misleading. "Skipping a frame" would only cause the bird to jump by 30 units rather than 15 if you were simply adding 15 units to its position on every call to rAF. But that would make the rate of movement of the bird change based on the rate at which rAF is called, and that would be poor design. If an implementation decided to call rAF at 30ms intervals (due to system load, for instance) then the bird would appear to move half as fast, which isn't what you want.

Assuming you're basing the position on the time at which the animation started, then the bird's apparent rate will not change depending on the rate at which rAF is firing.

With that said, I agree with you that there will still be a visual glitch in the current implementation. But what's actually happening is that the timestamp we're sending to rAF is wrong. We're sending current time. Depending on when rAF fires relative to the display refresh, the timestamp might be as much as 16ms behind the time the frame is actually seen. If you're basing motion on this timestamp, there will be an occasion when one frame will have a timestamp that is very close to the display time and the next will have a timestamp that is 15ms or so behind. That's why the glitch is happening.

So I don't believe this has anything to do with Timers per se, but with the wrong timestamp we happen to be sending to rAF. We knew this would happen and we chose this method because it gave us a nice simple first implementation. I still believe it's a fine implementation and it is platform independent, so it allows all the ports to support rAF.

> 
> For this reason, you really need a precise time source that is tied in to the actual display's refresh rate.  Not all displays are exactly 60Hz - at smaller form factors 50 or even 55hz displays are not completely unheard of.  Additionally the normal clock APIs aren't always precise enough to stay in sync with the actual display - particularly on windows it's really hard to find a clock that doesn't drift around all over the place.
> 
> The above analysis assumes that all calls are infinitely fast and there's no real contention for system resources.  In practice, though, this is rarely the case.  It's not uncommon that the system will temporarily get overloaded and has to make tradeoffs between maintaining a high framerate and remaining responsive to user input.  In Chromium, we have some logic to ensure that we load balance between handling input events and painting to ensure that processing one type doesn't completely starve the other.  In a multi-process environment, such as WebKit2 or Chromium, there needs to be coordination between the two processes in the non-composited path in order to paint a bitmap and get it onscreen.  If this logic is all operating completely independently from the rAF scheduling then it's very easy to end up triggering callbacks at a time when the browser can't produce a frame anyway, or painting without invoking the rAF callbacks even if they should be invoked.  A related issue is what to do when the rAF callbacks themselves cause us to be unable to hit our target framerate - for example by invalidating some portion of the page that is very expensive to repaint.  In that case, the ideal behavior is to throttle down the rAF callback rate to what we can sustain, which requires some feedback from the rest of the graphics stack.

I think the issue of supplying rAF with accurate timestamps is independent of whatever feedback mechanism an implementation uses to do the throttling. I'm sure those heuristics will improve over time. But the first step is to supply rAF with an accurate timestamp. I've opened https://bugs.webkit.org/show_bug.cgi?id=68911 for this. My intention is to create a call, similar to scheduleAnimation() but which simply asks platform specific code for a time estimate of when the next frame will be visible. That can not only be used as the timestamp sent to rAF, but as the basis for when the next call to rAF is made. That should avoid any excessive calls to rAF. 

For Mac, I plan to look into adding a displayLink thread which will maintain a timestamp value tied to refresh. I didn't try using a displayLink at first because I initially thought I'd use it to actually drive the firing of the callback, which would have been complicated and require a lot of communication between the threads. Just having the displayLink maintain a timestamp means I just need to provide thread safe access to that value. Hopefully that will keep overhead low but will achieve the synchronization goal.

> 
> Architecturally I think that WebCore is the wrong place to address these issues.  WebCore is responsible for generating repaint invalidations and passing them out to the WebKit layer via ChromeClient, and it's responsible for painting content when the WebKit layer asks it to.  Otherwise, all of the frame scheduling logic that would be relevant to rAF lives outside of WebCore in the port-specific layers.  Determining a valid clock source for a given graphics stack and deciding when to produce new frames are also highly port-specific.
> 
> Note that I don't think that using a timer is necessarily evil in all cases.  With some rendering architectures or graphics libraries, it may not be possible to produce a better solution.  We still use a timer in chromium in our non-composited path, although it is integrated with our frame scheduling and back pressure logic.  Additionally a timer is quite easy to code up and works "pretty well" most of the time (although you can be sure that your pickier users will complain).  There are also some benefits to providing this API even without great scheduling - for example a port can throttle the rAF callbacks for non-visible content or tabs without the backwards compat issues doing the same thing for setTimeout() would have, leading to dramatically lower power and resource consumption in some cases.
> 
> I still think it's dangerous to provide this as a default for all ports to fall back on, because I worry that if it's there ports will use it without considering the issues I mention above.

I don't think you need to worry. The current REQUEST_ANIMATION_FRAME_TIMER implementation does what it was intended to do - provide a platform independent implementation of requestAnimationFrame. It provides callbacks at an even rate and avoids excessive CPU consumption. I don't think the occasional animation glitch is a major flaw. It's just an issue that needs to be addressed on a platform specific basis to improve animation quality.

-----
~Chris
cmarrin at apple.com