[webkit-dev] localStorage quota limit

Wed Dec 2 22:51:04 PST 2009

On Wed, Dec 2, 2009 at 10:20 PM, Maciej Stachowiak <mjs at apple.com> wrote:

>
> On Dec 2, 2009, at 9:07 PM, Darin Fisher wrote:
>
> On Wed, Dec 2, 2009 at 8:44 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>
>>
>> On Dec 2, 2009, at 8:14 PM, Darin Fisher wrote:
>>
>> What about Maciej's comment.  JS strings are often use to store binary
>> values.  Obviously, if people stick to octets, then it should be fine, but
>> perhaps some folks leverage all 16 bits?
>>
>>
>> I think some people do use JavaScript strings this way, though not
>> necessarily with LocalStorage. This kind of use will probably become
>> obsolete when we add a proper way to store binary data from the platform.
>>
>> Most Web-related APIs are fully accepting of JavaScript strings that are
>> not proper UTF-16. I don't see a strong reason to make LocalStorage an
>> exception. It does make sense for WebSocket to be an exception, since in
>> that case charset transcoding is required by the protocol, and since it is
>> desirable in that case to prevent any funny business that may trip up the
>> server..
>>
>> Also, looking at UTF-16 more closely, it seems like all UTF-16 can be
>> transcoded to UTF-8 and round-tripped if one is willing to allow technically
>> invalid UTF-8 that encodes unpaired characters in the surrogate range as if
>> they were characters. It's not clear to me why Firefox or IE choose to
>> reject instead of doing this. This also removes my original objection to
>> storing strings as UTF-8.
>>
>>
> I think it is typical for UTF-16 to UTF-8 conversion to involve the
> intermediate step of forming a Unicode code point.  If that cannot be done,
> then conversion fails.  This may actually be a security thing.  If something
> expects UTF-8, it is safer to ensure that it gets valid UTF-8 (even if that
> involves loss of information).
>
>
> These security considerations seem important for WebSocket where the
> protocol uses UTF-8 per spec, but not for the internal storage
> representation of JavaScript strings in LocalStorage (where observable input
> and output are both possibly-invalid UTF-16).
>
> Regards,
> Maciej
>
>
Agreed.  I was responding to your statement: "It's not clear to me why
Firefox or IE choose to reject instead of doing this."  It seems likely to
me that neither Firefox nor IE made a concerted choice to treat bad UTF-16
this way.  It is probably just a consequence of using the default UTF-16 to
UTF-8 converter, which likely behaves as I described.

-Darin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20091202/84ec80d4/attachment.html>