[webkit-dev] Notifications for Blob serialization/deserialization

Tue Jul 3 16:08:48 PDT 2012

Do you think the way my patch does this, by exposing two new methods on the
BlobController for notifying serialization and deserialization, will be
sufficient?

For IndexedDB, it looks to me that notifications like this (or on the
BlobData ID, as you propose) will be critical -- the IndexedDB will need to
read the blob values to store them, at which time it can release the refs
it holds. The math won't work out unless it can know that the ref it just
deleted was given to it in serialization, which my notifier would
accomplish. Similarly for history -- if history supports Blob, then
packaging that serialization needs to hold the refcount.

All these use cases amount to the same thing: use of a serialized Blob
outside the "normal" usage of creation/deletion within a single context. I
think we're all in agreement that Blob semantics according to the spec
should support this use case, even if Blob URLs explicitly don't.

Do you want to revamp the management API to use BlobData IDs before adding
a pair of de/serialization APIs? If we do, we can skip some of the
nastiness here, like managing an extra serialization internal URL. It'll
definitely be easier to see. If that's going to take a while, though, I'd
like to get a preliminary solution in place so web intents can use blobs.

Who's a good person to ask about the history API's support (or lack
thereof) of Blobs?

On Tue, Jun 26, 2012 at 2:54 PM, Michael Nordman <michaeln at google.com>wrote:

> I think revamping our Blob handling is one of the projects that I should
> be working on next. What we have now is proving to be too difficult to work
> with in a variety of ways.
>
> The more i look at this particular difficulty with sending blobs within
> serialized values (within chromium), the more convinced i am that we should
> switch to identifying the 'data underlying a blob' rather than the 'blob
> object instance'. Currently our 'ids' identify a WebCore::Blob instance.
> The trouble with putting ids on blob object instances is that they dont
> cross process boundaries, so we end up either trying to create the id in
> advance of the instance in some other process, or relying on the
> originating instance still being around when the new instance in the other
> process gets coined (based on the originals ids). We have to juggle 2 ids
> instead of one and conceptually its just a mess.
>
> What I'm thinking about is changes along these lines...
>
> * WebCore::Blobs have an 'id' that refers to the 'data'. At ctor and dtor
> time, they increment/decrement a refcount, via ipcs, on the underlying
> 'data' in the main browser. A clone or copy of that WebCore::Blob instance
> anywhere in the system will carry that same 'id' and do that same
> refcounting of the underlying data.
>
> * When a IPC message containing a serialized blob (the id value) is in
> transit, extra refs are applied to the underlying blob data. There are
> several cases where we need to consider a blob as 'in transit'. Being
> referred to in an WebCore::SSV is one place. Being referred to in a
> content::SSV is another. As those objects come and go, they should inc/dec
> any blobs they refer to accordingly. A content::SSV has to distinguish
> between running in the main browser process and a child process to do that
> properly.
>
> * A challenging case is when sending SSVs containing blobs from the
> browser process to a child (renderer|worker) process. After a message is
> put on the wire going that direction, its not safe to deref the blob data
> until the receiving side has picked it up and added a ref of its own. I
> think we'll need ACKS for messages flowing in that direction so the
> browser-side knows when its safe to drop the extra in transit ref.
>
> * Its easier going the other way, from child to browser, since if the
> child has a ref when the 'id' is put on the wire, that ref cant possibly be
> removed prior to that msg reaching the browser.
>
> Not sure how much of this disccusion affects webkit's native blob handling
> really? This whole chromium multi-process problem doesn't exist there. I
> think we can put high level abstractions in webcore that we then fork at
> that high level... and the forked chromium impl deals with the complexities
> in our multi-process case w/o impatcing webcore's native single process
> impl.
>
> Another source of difficulty stems from how the ThreadableBlobRegistry
> posts all operations to the 'main' webkit thread. That's not going to work
> very well with how we handle IDB operations entirely from the 'context'
> thread even if its a background worker context. Message sent on behalf of
> blobs take a longer slower route to reach our main browser process than
> those sent on behalf of IDB, so an IDB message that contains a blob id may
> reach the main browser prior to the blob messages that create that blob
> reach the main browser. I think we have to avoid going thru the 'main'
> webkit thread in the chromium port for the blob handling messages to avoid
> odd races.
>
>
>
> On Tue, Jun 26, 2012 at 11:03 AM, Joshua Bell <jsbell at chromium.org> wrote:
>
>> On Tue, Jun 26, 2012 at 10:19 AM, Greg Billock <gbillock at google.com>wrote:
>>
>>> I've been working with Michael Nordman on a problem related to Blob
>>> serialization. Currently, if we serialize a Blob, there's no
>>> notification to the BlobRegistry, so the serialized Blob will be
>>> garbage collected at an indeterminate time if the context in which it
>>> was created is closed. This lifetime scoping is spelled out clearly
>>> for Blob URLs created using the URL API. I see nothing in the spec
>>> imposing the same lifetime scoping on Blobs themselves, so I think it
>>> is an implementation flaw. To correctly handle this serialization
>>> case, however, the BlobRegistry needs to know when serialization and
>>> deserialization operations happen for Blob objects.
>>>
>>> I've created a patch that adds that logic and plumbs it through to the
>>> Chromium API. Webkit patch:
>>> https://bugs.webkit.org/show_bug.cgi?id=89921 The corresponding
>>> Chromium-side patch is here: http://codereview.chromium.org/10662024/
>>>
>>>
>>> The strategy I used is to create a new internal URL and use that for
>>> serialization:
>>>
>>>  KURL cloneURL = BlobURL::createInternalURL();
>>>  blobRegistry().didSerializeBlob(cloneURL, blob->url());
>>>  m_writer.writeBlob(cloneURL.string(), blob->type(), blob->size());
>>>  m_blobURLs.append(cloneURL.string());
>>>
>>> Then upon deserialization, there's a corresponding didDeserializeBlob
>>> call.
>>>
>>> There are a couple alternatives I considered. One is to not instrument
>>> the SerializedScriptValue this way, but require callers to use the
>>> blobURLs() accessor and do the right thing then. I'm more in favor of
>>> inserting this logic directly into serialization, since it removes an
>>> implementation gotcha and I think this more closely follows the spec.
>>>
>>
>> The blobURLs() accessor was put in as a stop-gap to let us detect
>> Blobs-in-SSVs and fail early when the full plumbing was missing. Don't read
>> too much into its presence. (In practical terms, its current use could be
>> replaced by a "does this SSV contain a Blob?" flag, but at the time
>> exposing the list seemed like it might be useful.)
>>
>>
>>> Since the primary interaction with the BlobRegistry is via URL, I
>>> maintained that as well. The other is to use BlobData directly here.
>>> Michael is planning on doing some maintenance work to the ID system
>>> used internally anyway, and that may end up being the right path to
>>> take in that case.
>>>
>>> The open-ended implication of this change is that throwing away
>>> serialized values is now not acceptable -- they need to be checked for
>>> blob refs and then have some to-be-written code invoked to trigger the
>>> deref of the BlobData. We don't know where all that may be happening,
>>> currently, so it's still something of a science project.
>>
>>
>> Please coordinate with ericu@ (I'm sure you're already in contact) who
>> is looking at how to enable Blob support for IndexedDB. It's going to be
>> one of the more exciting cases, which  includes SSV data flowing across
>> process boundaries (in the Chromium port), storage to disk, and deletion
>> operations occurring in processes that do not have script contexts
>> available.
>>
>>
>>> We already
>>> have a SerializedScriptValue object we could interrogate upon
>>> destruction, but the whole point is that the wire format created there
>>> could show up anywhere, so we need to either restrict that ability and
>>> maintain object-level control, or trust callers of the serialized wire
>>> string accessor to handle them correctly. Neither is particularly
>>> exciting. Any better ideas?
>>>
>>> -Greg
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120703/4ab0b280/attachment.html>