[webkit-dev] Notifications for Blob serialization/deserialization

Tue Jun 26 14:54:51 PDT 2012

I think revamping our Blob handling is one of the projects that I should be
working on next. What we have now is proving to be too difficult to work
with in a variety of ways.

The more i look at this particular difficulty with sending blobs within
serialized values (within chromium), the more convinced i am that we should
switch to identifying the 'data underlying a blob' rather than the 'blob
object instance'. Currently our 'ids' identify a WebCore::Blob instance.
The trouble with putting ids on blob object instances is that they dont
cross process boundaries, so we end up either trying to create the id in
advance of the instance in some other process, or relying on the
originating instance still being around when the new instance in the other
process gets coined (based on the originals ids). We have to juggle 2 ids
instead of one and conceptually its just a mess.

What I'm thinking about is changes along these lines...

* WebCore::Blobs have an 'id' that refers to the 'data'. At ctor and dtor
time, they increment/decrement a refcount, via ipcs, on the underlying
'data' in the main browser. A clone or copy of that WebCore::Blob instance
anywhere in the system will carry that same 'id' and do that same
refcounting of the underlying data.

* When a IPC message containing a serialized blob (the id value) is in
transit, extra refs are applied to the underlying blob data. There are
several cases where we need to consider a blob as 'in transit'. Being
referred to in an WebCore::SSV is one place. Being referred to in a
content::SSV is another. As those objects come and go, they should inc/dec
any blobs they refer to accordingly. A content::SSV has to distinguish
between running in the main browser process and a child process to do that
properly.

* A challenging case is when sending SSVs containing blobs from the browser
process to a child (renderer|worker) process. After a message is put on the
wire going that direction, its not safe to deref the blob data until the
receiving side has picked it up and added a ref of its own. I think we'll
need ACKS for messages flowing in that direction so the browser-side knows
when its safe to drop the extra in transit ref.

* Its easier going the other way, from child to browser, since if the child
has a ref when the 'id' is put on the wire, that ref cant possibly be
removed prior to that msg reaching the browser.

Not sure how much of this disccusion affects webkit's native blob handling
really? This whole chromium multi-process problem doesn't exist there. I
think we can put high level abstractions in webcore that we then fork at
that high level... and the forked chromium impl deals with the complexities
in our multi-process case w/o impatcing webcore's native single process
impl.

Another source of difficulty stems from how the ThreadableBlobRegistry
posts all operations to the 'main' webkit thread. That's not going to work
very well with how we handle IDB operations entirely from the 'context'
thread even if its a background worker context. Message sent on behalf of
blobs take a longer slower route to reach our main browser process than
those sent on behalf of IDB, so an IDB message that contains a blob id may
reach the main browser prior to the blob messages that create that blob
reach the main browser. I think we have to avoid going thru the 'main'
webkit thread in the chromium port for the blob handling messages to avoid
odd races.

On Tue, Jun 26, 2012 at 11:03 AM, Joshua Bell <jsbell at chromium.org> wrote:

> On Tue, Jun 26, 2012 at 10:19 AM, Greg Billock <gbillock at google.com>wrote:
>
>> I've been working with Michael Nordman on a problem related to Blob
>> serialization. Currently, if we serialize a Blob, there's no
>> notification to the BlobRegistry, so the serialized Blob will be
>> garbage collected at an indeterminate time if the context in which it
>> was created is closed. This lifetime scoping is spelled out clearly
>> for Blob URLs created using the URL API. I see nothing in the spec
>> imposing the same lifetime scoping on Blobs themselves, so I think it
>> is an implementation flaw. To correctly handle this serialization
>> case, however, the BlobRegistry needs to know when serialization and
>> deserialization operations happen for Blob objects.
>>
>> I've created a patch that adds that logic and plumbs it through to the
>> Chromium API. Webkit patch:
>> https://bugs.webkit.org/show_bug.cgi?id=89921 The corresponding
>> Chromium-side patch is here: http://codereview.chromium.org/10662024/
>>
>>
>> The strategy I used is to create a new internal URL and use that for
>> serialization:
>>
>>  KURL cloneURL = BlobURL::createInternalURL();
>>  blobRegistry().didSerializeBlob(cloneURL, blob->url());
>>  m_writer.writeBlob(cloneURL.string(), blob->type(), blob->size());
>>  m_blobURLs.append(cloneURL.string());
>>
>> Then upon deserialization, there's a corresponding didDeserializeBlob
>> call.
>>
>> There are a couple alternatives I considered. One is to not instrument
>> the SerializedScriptValue this way, but require callers to use the
>> blobURLs() accessor and do the right thing then. I'm more in favor of
>> inserting this logic directly into serialization, since it removes an
>> implementation gotcha and I think this more closely follows the spec.
>>
>
> The blobURLs() accessor was put in as a stop-gap to let us detect
> Blobs-in-SSVs and fail early when the full plumbing was missing. Don't read
> too much into its presence. (In practical terms, its current use could be
> replaced by a "does this SSV contain a Blob?" flag, but at the time
> exposing the list seemed like it might be useful.)
>
>
>> Since the primary interaction with the BlobRegistry is via URL, I
>> maintained that as well. The other is to use BlobData directly here.
>> Michael is planning on doing some maintenance work to the ID system
>> used internally anyway, and that may end up being the right path to
>> take in that case.
>>
>> The open-ended implication of this change is that throwing away
>> serialized values is now not acceptable -- they need to be checked for
>> blob refs and then have some to-be-written code invoked to trigger the
>> deref of the BlobData. We don't know where all that may be happening,
>> currently, so it's still something of a science project.
>
>
> Please coordinate with ericu@ (I'm sure you're already in contact) who is
> looking at how to enable Blob support for IndexedDB. It's going to be one
> of the more exciting cases, which  includes SSV data flowing across process
> boundaries (in the Chromium port), storage to disk, and deletion
> operations occurring in processes that do not have script contexts
> available.
>
>
>> We already
>> have a SerializedScriptValue object we could interrogate upon
>> destruction, but the whole point is that the wire format created there
>> could show up anywhere, so we need to either restrict that ability and
>> maintain object-level control, or trust callers of the serialized wire
>> string accessor to handle them correctly. Neither is particularly
>> exciting. Any better ideas?
>>
>> -Greg
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120626/c22e2c60/attachment.html>