[webkit-dev] SharedWorker design doc

Fri Apr 17 21:55:45 PDT 2009

Hi all,
I've put together a proposal for implementing SharedWorkers in WebKit. The
worker lifecycle issues turned out to be thornier than I originally
expected, mostly because the implications of the spec aren't obvious right
away (to me, anyway :)

Any feedback would be appreciated, especially for some of the
cross-threading and worker lifecycle issues.

Cheers,

-atw

WebKit SharedWorker design
Shared workers (http://dev.w3.org/html5/workers/#shared-workers) are similar
to the existing dedicated workers, with a few API differences.

   - SharedWorkers are shared - if an application creates a SharedWorker()
   while there's already a non-closing instance of that worker anywhere in the
   browser, then it gets a reference to the existing worker thread.
   - All communication is via explicit MessagePorts. SharedWorkers receive
   new MessagePorts via onconnect() rather than raw messages via onmessage()
   - SharedWorkers have the same lifecycle as dedicated workers according to
   the spec. SharedWorkers use explicit MessagePorts for communication instead
   of implicit MessagePorts like dedicated Workers, but the issues are the same
   (especially since dedicated Workers can use MessagePorts as well, if
   entangled ports are sent to/from the worker via postMessage()). New code
   will be needed here, since WebKit doesn't currently implement all aspects of
   the worker lifecycle (it's not needed yet because sending MessagePorts to
   workers is not yet supported).
   - SharedWorkers have explicit access to the ApplicationCache APIs, while
   dedicated Workers merely inherit the ApplicationCache from their parent
   window.

>From the browser point of view, SharedWorkers are largely indistinguishable
from dedicated Workers. They run in their own SharedWorkerThread with a
SharedWorkerContext both of which derive from common base classes shared
with dedicated WorkerThreads/WorkerContexts. In Chrome, SharedWorkers will
run in a separate process (not in the renderer process) just like dedicated
Workers.
Creating SharedWorkersThe core of our support for SharedWorkers is the
SharedWorkerRepository, which provides a thread-safe interface to a map
whose keys are a combination of SecurityOrigin and workerName, and whose
values are references to SharedWorkerContext objects. The
SharedWorkerRepository is also responsible for tracking which SharedWorker
objects are associated with a given SharedWorkerContext, for the purposes of
sending close events when the worker shuts down.

This section describes the default WebKit implementation of the repository -
Chrome will provide its own implementation whose behavior is similar, but
whose internals are different because it runs in the browser process
(required because it's the only way to provide the necessary
cross-render-process synchronization). We define the
SharedWorkerContextProxy as an interface to allow the Chrome implementation
to vary - there is no similar SharedWorkerObjectProxy interface since this
would only be used internally by the Repository which will be
Chrome-specific code anyway.

class SharedWorkerRepository {
  // Does a synchronous get-or-create of a worker with the specified name.
  static public SharedWorkerContextProxy addWorker(SharedWorker *worker,
                                                   SecurityOrigin* origin,
                                                   const String& url,
                                                   const String& name);
  // Marks a worker as closing (removes it from the map). A close event is
  // propagated to all SharedWorker objects associated with this context.
  static public void workerThreadClosed(SharedWorkerThread* worker);

  // TODO: Add way to send console messages to parent window contexts?
}

class SharedWorkerContextProxy {

  // Sends a connect event to the worker passing this port.
  void connect(MessagePort* port);

  // Invoked when a SharedWorker object is destroyed. This causes the
  // SharedWorker to be removed from the repository.
  // If we have a close event in the queue for this worker, will that be
enough
  // to keep it from being GC'd? Or is it possible for the worker to get
deleted
  // while there are events queued for it?
  void workerObjectDestroyed(SharedWorker *);
}

As noted above, the SharedWorkerRepository refers (via the
SharedWorkerContextProxy interface) to the set of all SharedWorkerContext
objects whose *closing* flag is false, in addition to all SharedWorker
objects associated with each SharedWorkerContext.
SharedWorkerRepository::addWorker()The SharedWorker constructor passes a
copy of the newly-created object into SharedWorkerRepository::addWorker().
This grabs the repository mutex, and then performs the following steps:

*If SharedWorkerContextProxy for passed origin/name does not exist in map:
    create new SharedWorkerContextProxy
If SharedWorkerContextProxy has no worker thread:
    initiate code load (within current context).
Do we need to do anything special here re: the ApplicationCache, to make
sure we load from the most recent cache rather than from the current
context's cache?
Add SharedWorker to list of objects associated with SharedWorkerContextProxy
return SharedWorkerContextProxy*

The SharedWorker constructor stores away a reference to the
SharedWorkerContextProxy. It then creates a new entangled MessagePort pair,
exposes one end via its *port *attribute and passes the other end into the
SharedWorkerContextProxy::connect() handler, then returns to the caller.
SharedWorker::notifyFinished() (code is loaded)

*When the code load is complete:*
*  if code load error:*
*    invoke MessagePort.close() on the port*
*    invoke app error handler directly on SharedWorker object*
*    call SharedWorkerContextProxy::workerObjectDestroyed() to remove
association
**    clear reference to SharedWorkerContextProxy
**  else: // code load success*
*    call SharedWorkerContextProxy::scriptLoaded()*

SharedWorkerContextProxy::scriptLoaded():

*Grab repository mutex*
*if workerThread == null:*
*    create workerThread*
*    pass in script*
*    send queued up connect events*

*SharedWorkerContextProxy::connect()  *

This is responsible for sending the connect event to a given worker thread.
Like WorkerMessagingProxy::postMessageToWorkerContext(), it needs to handle
the case where the worker thread has not yet been created (waiting on script
to load):

*Grab repository mutex*
*if workerThread != null:*
*  send connect event to worker thread*
*else:*
*  add connect event to queue (sent in scriptLoaded() above).*

To send a connect event to the worker thread, we queue up a
SharedWorkerConnectTask. This task associates the MessagePort with the
worker's execution context (via MessagePort::attachToContext()) and then
invokes the worker's onconnect() handler on the worker thread.

Open issue: What about console/inspector messages generated by
SharedWorkers. Can we send them off to the console/inspector directly, or do
we have to expose API on SharedWorkerRepository for forwarding them to a
SharedWorker's document (possibly to all associated documents?) In the case
of nested workers, do we have to continue to fan out these console messages?
Seems like we might get loops as well if you have two shared workers
referring to one another.

Closing SharedWorkersShared workers can be closed through various means: by
becoming unreachable, through user action, or by invoking
SharedWorkerContext::close().

When a worker is closing by the worker itself calling close(), it is first
disassociated from the repository by invoking
SharedWorkerRepository::workerThreadClosed() which grabs the repository
mutex and performs the following actions:

*Get SharedWorkerContextProxy associated with WorkerThread*
*For each SharedWorker associated with this SharedWorkerContextProxy object:
*
*    Queue up close event
(SharedWorker->scriptExecutionContext()->postTask())*
*Remove SharedWorkerContextProxy from the map*

This ensures that all existing SharedWorker objects receive the proper
close() notifications, but that no new SharedWorker objects are associated
with the SharedWorkerContext.

Open issue: Is it OK for the repository to maintain explicit pointers to
objects like SharedWorker and send events via
workerObj->scriptExecutionContext()->postTask()? Is there a safer way to do
this (say, via some kind of wrapper, ala WorkerMessagingProxy)?

At this point, the SharedWorkerContext is left to manage its own demise, by
queueing a task that fires a close event at the worker global scope. Once
the close event has been fired, WorkerRunLoop.terminate() is invoked to drop
all remaining tasks for the worker and cause the thread to exit, freeing the
SharedWorkerContext. The "kill a worker" algorithm described in section 4.6
of the WebWorkers spec suggests that timeouts may be imposed by the
UserAgent for the close() handler as well as for any tasks that are
executing before the close task is executed. How can we enforce these
timeouts by aborting currently executing script?

Reuse/Refactoring of existing dedicated Worker codeBoth WorkerThread and
WorkerRunLoop can be re-used nearly entirely - we'll need to refactor out
the code in WorkerThread that deals with "PendingActivity" since we don't
care about that for SharedWorkers and create a factory method for creating
the WorkerContext, but the rest of the code should work largely verbatim.

Most of WorkerContext should be common between shared and dedicated workers
- there are a few APIs (like postMessage() and dispatchMessage()) that
aren't needed for SharedWorkers, so we'll create a common baseclass that
contains the base functionality and support for items in WorkerGlobalScope
without any of the dedicated/specific shared functionality.

class SharedWorkerContext : public BaseWorkerContext {
   // Support for specific items in SharedWorkerGlobalScope
   public:
     String name() const;
     void setOnconnect(PassRefPtr<EventListener> eventListener) {
         m_onconnectListener = eventListener;
     }
     EventListener* onconnect() const { return m_onconnectListener.get(); }
     // TODO: Add applicationCache functionality
}

Worker LifecycleOn the DOM side, the SharedWorker object should remain live
as long as it's reachable by javascript, or the SharedWorkerRepository holds
a reference to it (the SharedWorkerRepository releases the reference once
SharedWorkerRepository::workerThreadClosed() is invoked for the associated
SharedWorkerContext). As a future optimization, we could probably GC the
object earlier if it has no close event handler, but we won't do that
initially.

The current dedicated Worker code keys the reachability of the worker thread
to the reachability of the parent Worker object itself - the Worker object
destructor calls WorkerContextProxy::workerObjectDestroyed() which
terminates the worker thread (no close event is currently generated).

The current dedicated Worker implementation will suffice for dedicated
Workers until we support posting MessagePorts to the dedicated worker
thread. At that point we should probably change the dedicated worker code to
use a real MessagePort behind the scenes, and use the same
MessagePort-reachability mechanism that SharedWorkers will use.
Worker Lifecycle spec explained in non-normative languageThe spec describes
3 states for workers: permissible, active needed, or suspendable. Only
workers that are active needed should be able to execute. Suspendable
workers should be suspended. All other workers should be closed. Note: The
HTML5 spec refers to a 4th protected worker state, but I believe this to be
unnecessary - I'm working with Ian Hickson to clarify this.
PermissibleThe spec specifies that a worker is *permissible* based on
whether it has a reachable MessagePort that has been entangled *at some
point in the past* with an active window (or with a worker who is itself
permissible). Basically, if a worker has *ever* been entangled with an
active window, or if it's ever been entangled with a worker who is itself
permissible (i.e. it's associated with an active window via a chain of
workers that have been entangled at some point in the past) then it's
permissible.

The reason why the "at some point in the past" language is present is to
allow a page to create a fire-and-forget worker (for example, a worker that
does a set of long network operations) without having to keep a reference to
that worker around.

Once the referent windows close, the worker should also close, as being
permissible is a necessary (but not sufficient) criteria for being
runnable.
Active neededA permissible worker is *active needed* if:

   1. it has pending timers/network requests/DB activity, or
   2. it is currently entangled with an active window, or another active
   needed worker.

The intent behind #1 is to enable fire-and-forget workers that don't exit
until they are idle. The intent behind #2 is that an idle worker shouldn't
exit as long as it's reachable by an active window (possibly chained through
other workers).
SuspendableA *suspendable* worker is entangled with a non-active window
object, or is entangled with another suspendable worker. When a worker is
suspendable it should stop running (stop processing events) until it returns
to the active needed state. How do we handle this with dedicated workers
currently? If you navigate away from a window, does a dedicated worker get
suspended, and resumed again when you hit the back button?
Tracking permissible stateWe will create a global map (protected by a mutex)
keyed by WorkerContext whose value is a set of active Documents associated
with that WorkerContext. When a Document has a port which becomes entangled
with a WorkerContext, we add that Document to the list of documents
associated with that WorkerContext in the map. When a Document becomes
inactive, we remove it from the map. When a WorkerContext becomes entangled
with another WorkerContext, the two sets of associated Documents are merged,
and the combined set is used for each context - in effect, both workers
inherit the Window associations of the other.

When a document closes, we walk the map and remove each reference to the
document. If a given WorkerContext has no more items, then the worker is no
longer permissible and we should close it.
Tracking active needed stateA worker is *active needed* if it's permissible
and has pending activity, or is reachable via a chain of MessagePorts from
an active window or worker. If we view the set of ScriptExecutionContexts
linked by MessagePorts as a graph, then a worker is reachable if its
subgraph is connected to an active window.

Determining whether a worker is active only requires a simple breadth-first
search of this graph, triggered when a WorkerContext has one of its ports
unentangled (currently the owning ScriptExecutionContext is not notified
when a port is unentangled, so we'll need to add code to generate this
notification). When the WorkerContext receives this notification, it can
grab a global mutex and traverse the graph - if the WorkerContext is no
longer connected to an active window it can initiate a close. When the
WorkerContext is closed, its own MessagePorts will be unentangled, which
will cascade to cause any related Workers to be shut down as appropriate.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090417/dd04e661/attachment.html>