[webkit-dev] Limiting slow unload handlers (Re: Back/forward cache for pages with unload handlers)

Wed Sep 16 22:33:19 PDT 2009

On Wed, Sep 16, 2009 at 9:59 PM, Maciej Stachowiak <mjs at apple.com> wrote:

>
> On Sep 16, 2009, at 4:49 PM, Darin Fisher wrote:
>
>
>
> On Wed, Sep 16, 2009 at 2:21 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>
>>
>> On Sep 16, 2009, at 1:58 PM, John Abd-El-Malek wrote:
>>
>>
>> Either way though, I don't think it'll work in this case.  I've seen pages
>> have 8 beforeunload/unload handlers each sleeping for 200ms, just so that
>> they don't have 1 handler that'll trip the slow script detection.  If we
>> decrease the timeout for unload handlers, they would just increase the
>> number of registered handlers proportionally.
>>
>>
>> I think that setting an upper bound on the amount of time that can be
>> spent in all unload handlers is a better solution than hacking the behavior
>> of the Date API. Because (a) It's less likely to have unexpected side
>> effects; and (b) there's no way for content authors to work around it, so we
>> are less likely to end up in an "arms race" situation. There were worries
>> expressed that swapping or context switching might trigger false positives,
>> but I expect this is unlikely in practice, based on our experience with the
>> slow script dialog.
>>
>
>
> I too would like to avoid an arms race, but...
>
> I disagree.  You'll get false positives at an unacceptable rate, especially
> if you try to tamp down the interval to a small fraction of a second.  We
> saw these problems in spades with Chrome's hang monitor (detecting
> unresponsive subprocesses), and we had to push the interval to something
> larger than we would have liked.
>
>
> Interesting - I don't recall every seeing false positives with Safari's
> "slow script" detection. Maybe due to our particular timeout design (see
> below).
>
>
> Counting work instead of time is much more robust.  The getTime call counts
> is a measure of work, albeit approximate.
>
>
> The way JavaScriptCore execution time limit works is that the clock doesn't
> start ticking until JS execution begins. So it's unlikely that a full
> timeout cycle will occur while the process is swapped out or paused, since
> the clock won't start running until the process is actually executing
> JS. And the actual timeout check is only done occasionally (every N loop
> back edges or function calls, for some value of N). So even if there's a
> context switch in the middle of JS execution, it's unlikely that JS
> processing will be terminated immediately upon return. So maybe a different
> solution is appropriate for JavaScriptCore than V8.
>
>
Consider what happens if during JS execution garbage collection runs.  That
could cause portions of the VM to be swapped into RAM, which could cause
significant wall clock delay.  Do you discount time spent in GC?

>
> Also, it is very important to note that content authors are not entirely in
> control here.  A content author may have some ads on their page, and it may
> be the ad that is delivering the bad unload handler.  If we applied a limit
> to all unload handlers, then we'd be punishing both the content author as
> well as the ad provider.  That doesn't seem fair to the content author, who
> might have a legit unload handler.
>
>
> As long as the author installs their unload handlers before the ad does,
> they won't have a problem.
>

Good point.

>
> To help us decide whether (and how) to tackle this for non-V8 ports of
> WebKit, can the Chrome team share the data they have on the following:
>
> (1) Frequency of pages doing a busy loop in an unload handler. I've heard
> it's common but no specific data.
> (2) A few examples of URLs to pages that do this, so we can study what they
> are doing and why.
> (3) Frequency of a date-based loop being used to implement the busy loop.
> (4) Average additional delay imposed by unload busy loops.
> (5) Proportion of sites that use busy looping in unload solely for link
> tracking and not for any other purpose.
>
>
You can find links to example sites in the Chromium bug report:
http://code.google.com/p/chromium/issues/detail?id=7823

The bug contains some distilled data.

By the way, the issue is not with trouble sites but with trouble ad networks
and/or producers.  I believe the web sites are just victims here.

> The reason I'm interested in (1)-(4) is to determine if doing nothing is
> really worse than doing something hackish, as suggested by Adam.
>
> The reason I'm interested in (5) is to determine if <a ping> is an adequate
> replacement. I think if we break existing techniques, we need to give
> authors a replacement. unload fires when the user leaves the page in any way
> whatsoever, including closing the window or typing in the location field. So
> sites could use I/O in unload plus a busy loop to track the amount of time
> the user spent on the page, or to save state. If sites are doing that, then
> <a ping> won't be an adequate replacement, so we'll have to do something
> like Adam's suggestion to guarantee completion of I/O that is initiated in
> the unload handler. The reason I think it's possible sites care about more
> than just link tracking is that if that's all they care about, they could
> just use redirect links, and get a better user experience today than busy
> looping in unload. If sites are not using redirects for link tracking today,
> why would they use <a ping> in the future?
>
>
The reason why I don't think they are using it for critical data is because
they have a timeout.  If they were trying to persist critical data then they
would just use a synchronous XHR.  In this case, they are trying to increase
the probability of successfully sending a ping by giving themselves a few
100 ms.

-Darin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090916/7ef11564/attachment-0001.html>