[webkit-dev] XHR responseArrayBuffer attribute

Wed Sep 29 18:34:05 PDT 2010

On Sep 29, 2010, at 2:02 PM, Kenneth Russell wrote:

> On Tue, Sep 28, 2010 at 11:26 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>> 
>> On Sep 28, 2010, at 11:05 AM, Kenneth Russell wrote:
>> 
>>> On Tue, Sep 28, 2010 at 9:45 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>>>> 
>>>> On Sep 28, 2010, at 7:15 AM, Chris Marrin wrote:
>>>> 
>>>>> 
>>>>> On Sep 27, 2010, at 6:37 PM, Maciej Stachowiak wrote:
>>>>> 
>>>>>> 
>>>>>> On Sep 27, 2010, at 3:19 PM, Michael Nordman wrote:
>>>>>> 
>>>>>>> Webkit's XHR currently does not keep two copies of the data that I can see. I think we should avoid that.
>>>>>> 
>>>>>> We could keep the raw data around, which hopefully is directly usable as an ArrayBuffer backing store, and only translate it to text format when/if the client requests responseText.
>>>>> 
>>>>> Yes, the raw data should be usable without translation in an ArrayBuffer. But we'd still need to make a copy of the raw bits when a new ArrayBuffer is created via responseArrayBuffer(), because that object is mutable.
>>>> 
>>>> Is there an immutable variant of ArrayBuffer? If not, we really need one. But even without that, note that you don't necessarily need to make an immediate copy, you can use copy-on-write.
>>>> 
>>>> The immutable variant would be helpful since we could avoid implementing threadsafe copy-on-write just to allow efficient passing of ArrayBuffers to Workers.
>>> 
>>> Chris has raised this issue on the public_webgl list, and we've begun
>>> discussion there, but I would like to point out that having an
>>> immutable ArrayBuffer and views on it does not help with the situation
>>> of passing data to or from a web worker. The side that constructs the
>>> data will necessarily have a mutable view, so it will be able to cause
>>> changes that can be seen on the other side even if the view on the
>>> other side is immutable.
>> 
>> Not if the side that got the data got it in immutable form in the first place. For example, if you get an immutable ArrayBuffer from XHR in a Worker, then you can pass it to another Worker or to the main thread without the need for any copying or copy-on-write.
>> 
>>> 
>>> We have a design that will allow efficient zero-copy producer/consumer
>>> queues to be implemented using TypedArrays while maintaining
>>> ECMAScript's shared-nothing semantics. I'll be happy to sketch it out,
>>> but it's probably most appropriate for a mailing list like
>>> public_webgl.
>> 
>> I'm curious to hear it and I don't follow public_webgl.
>> 
>> I'd specifically like to handle the following case:
>> - You obtain a chunk of binary data from the network or filesystem and want to pass it to another thread without copying.
> 
> The scenario to which I've given the most thought is that where a
> continuous stream of data is sent from a worker to the main thread,
> vice versa, or back and forth. I'll describe the solution for this
> case first and then discuss how it applies to yours.
> 
> In the producer/consumer scenario it is essential to avoid continuous
> memory allocation and deallocation. Ideally the main thread and worker
> would share a fixed size memory region. Since this would violate the
> shared-nothing semantic, a different solution is needed.
> 
> The idea is that when an ArrayBuffer is sent via postMessage, it is
> atomically "closed" on this side; its publicly visible length goes to
> 0, as do the lengths of any views referring to it. On the other side,
> a new ArrayBuffer wrapper object is synthesized, pointing to the same
> storage and with the original length.
> 
> To be able to reuse the same memory region over and over again, the
> other side would simply send the ArrayBuffer back for re-filling via
> postMessage. Ping-ponging ArrayBuffers back and forth achieves
> zero-copy transfer of large amounts of data while still maintaining
> the shared-nothing semantic. The only allocations are for the (tiny)
> ArrayBuffer wrapper objects, but the underlying storage is stable.
> 
> Implementing this idea will require a couple of minor additions to the
> TypedArray specification (in particular, the addition of a "close"
> method on ArrayBuffer) as well as defining the semantics of sending an
> ArrayBuffer via postMessage. I hope to prototype it soon.
> 
> Regarding your scenario, I would simply post the ArrayBuffer from the
> XHR object to the worker with the above semantics. The main thread
> would then not be able to access the data in the ArrayBuffer, but
> sending it to the worker for processing would not involve any data
> copies.

Sure, transfer semantics avoid shared mutable state, though it would be inconsistent with most other "pure data" types. But what if you have some data that doesn't need mutating but you'd like to share with multiple other Workers? Now you'd be forced to explicitly copy. The availability of an immutable variant would let you avoid that. At most, you'd need to copy once if your ArrayBuffer started immutable; or you could have the ability to convert mutable to immutable at runtime (it would have to be a one-way conversion, of course).

> 
> I don't understand why the ArrayBuffer returned from the XHR needs to
> be strictly immutable. Since the application created the XHR object,
> it seems to me it should be fine if it mutates the copy of the data
> attached to it.

If the reference is immutable, then you can share the backing store with the cache. If it's mutable, at best you can do copy-on-write.

If the reference is mutable, then multiple independent pieces of code can't safely share the binary response from the same XHR. They would have to copy.

It also seems just plain surprising that you get get the response from XHR, modify it (perhaps thinking it is a private copy), and then if you re-get it later, it still shows the changes. responseText doesn't act that way. It would be even more weird that if you pass the response to a worker, you lose access to it unless you explicitly copied (assuming your transfer proposal).

Immutability lets you avoid a bunch of copying, and in the (likely) rare case where you want to mutate the binary response data, you can make a copy only when actually needed, instead of imposing costs on clients that only want to read.

Regards,
Maciej