[Webkit-unassigned] [Bug 81270] FileApi does not handle files with NFD encoded umlaut in file name

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Mar 20 16:14:22 PDT 2012


https://bugs.webkit.org/show_bug.cgi?id=81270





--- Comment #13 from Eric U. <ericu at chromium.org>  2012-03-20 16:14:22 PST ---
(In reply to comment #8)
> (From update of attachment 132844 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=132844&action=review
> 
> >> Source/WebCore/ChangeLog:3
> >> +        [Chromium] FileApi does not handle files with NFD encoded umlaut in file name
> > 
> > Did you mean the other way around? TextEncoding::encode produces NFC, which is the standard encoding on the Web.
> 
> No, I did not mean another way around..
> The problem occurs on ChromeOS, where we can access files on the external storage. These files may have names encoded in NFD form. When we try to access, we break because the file url is in NFC form, and the file cannot be fond on the disk.


Just to expand on this, the problem comes when the path does a round-trip from UTF-8 to normalized UTF-16 and back to UTF-8 [now normalized].  ChromeOS then tries to find the original file on disk using the normalized filename, and fails.  It's not a matter of the FileSystem code treating all Unicode forms as equivalent; it's that the underlying host filesystem doesn't.  To most [all?] Linux filesystems, paths are just strings of bytes, and character sets/encodings don't come into it.

This is an odd case that's not previously existed in the web platform AFAIK, so it's not clear what the perfect solution is.  But what's there now isn't working for that particular use case; they need a non-destructive round-trip.

We had also considered keeping the original string around [plumbing it all the way through as an opaque data blob, held in the directory entry].  But then if you pulled the name out of the DirectoryEntry into a JS variable, that sideband data wouldn't go with it, and so calls using that string would mysteriously fail.


> >>> Source/WebCore/ChangeLog:10
> >>> +        -switch all calls to encode to normalizeAndEncode except the ones in platform/KURLGoogle.cpp
> >> 
> >> Does that introduce another difference between KURL and GURL?  Can you write a test in LayoutTest/fast/url that shows the difference for this patch?
> > 
> > The goal of existing encode() behavior is to make WebKit on any platform generate the same kind of data as Windows browsers do. So when faced with input that's not in NFC form (like a file name on OS X), it converts the data to NFC.
> > 
> > I think that this just makes sense. Sending randomly normalized data across the wire (in URLs or otherwise) does not. This patch seems to affect all URLs, not just file system ones.
> 
> Do you think it would make sense to introduce encodeWithoutNormalization() method and call it only when generating file urls for filesystem operations?
> 
> >> Source/WebCore/platform/text/TextEncoding.h:72
> >> +        CString normalizeAndEncode(const UChar*, size_t length, UnencodableHandling) const;
> > 
> > This new method does not make a lot of sense conceptually.
> > 
> > 1. There are multiple normalization schemes. Normalize to what?
> > 2. You only need to normalize when using one of Unicode encodings (such as UTF-*). So, exposing this as a public method on TextEncoding is confusing.
> > 3. As mentioned above, are there any cases where one wants non-NFC data?
> 
> see above
> 
> >> LayoutTests/fast/filesystem/file-from-file-entry-nfd-name.html:7
> >> +<script src="resources/file-from-file-entry-nfd-name.js"></script>
> > 
> > Please don't split tests into .html and .js. The only exception is fast/js directory, and then the .js part should be pure JavaScript, no DOM.
> 
> Hm, most of the tests in fast/filesystem are sepoarated this way..

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list