[Webkit-unassigned] [Bug 144320] URL paths should not be normalized when encoded

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Apr 28 23:49:10 PDT 2015


https://bugs.webkit.org/show_bug.cgi?id=144320

--- Comment #12 from Carlos Garcia Campos <cgarcia at igalia.com> ---
(In reply to comment #10)
> > The file itself in the file system can be normalized or not, in the particular case of a filename containing a 'ñ', it can be encoded as U+006E U+0303, or U+00F1, but they end up being different files, because the bytes are different, even if the visual representation is the same.
> 
> Yes, I understand that this is what you are saying. This is a bug in
> server's file system - everything that supports Unicode must treat different
> normalization forms as equivalent, so a filesystem may not have two files
> whose names only differ in normalization form.

I don't think Linux and the most common file systems used in Linux know anything about encoding, filenames are just bytes (only exceptions are 0 and /, I think), so two files with different bytes in their name are just different. It seems HFS does care about encodings and normalization, I didn't know it. So, maybe this change should be made specific to Linux (or other unix systems, except mac)

> > The very same files worked in chrome and firefox.
> 
> Yes, they work for you in a test case, but they won't work in other
> scenarios, most notably those that involve user input on a Mac. This is as I
> said, the behavior in WebKit is intentionally different to have a more
> common Unicode form on the wire. Windows browsers have the luxury of letting
> the bytes through unchanged because their OS and Internet both use the same
> form, but for Safari, it is not as straightforward.

Well, I isolated the problem in a test case, but the issue was happening in real cases. It's not that the server doesn't normalize the filenames, the server just uses what there's in the filesystem.

> > Form data decoding hasn't changed, except for filenames, so what the user types in a search form is still normalized.
> 
> The changes in encodeRelativeString() are quite confusing, I'm not sure if
> that's correct. There is some "otherDecoded" string that is
> counter-intuitively a result of calling encode(), and that's separate from
> where the path is handled.
> 
> Another change in this patch is that filenames in form data are not encoded.
> This means that a file uploaded from Mac will retain the custom HFS
> normalization form that is not used anywhere else - how if that the right
> thing to do?

I assumed all file systems handled files as just bytes, I didn't know HFS worked differently. We need to make this depending on the platform.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20150429/e490c845/attachment.html>


More information about the webkit-unassigned mailing list