[Webkit-unassigned] [Bug 24906] 0x5C of EUC-JP is not Yen Sign but U+005C

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Jan 12 02:47:13 PST 2010


--- Comment #35 from Shinichiro Hamaji <hamaji at chromium.org>  2010-01-12 02:47:12 PST ---
Jungshik, I think this bug is still live. I'm new in this bug and the detail
may have changed though. I confirmed this bug by

1. Access http://shinh.skr.jp/tmp/backslash_euc.html
2. Copy all text and paste it into the Windows' command prompt
3. The command doesn't work :(

I agree with Jungshik that this bug is terrible when we copy texts. For
example,  fairly large number of non-programmers post a blog about Windows' BAT
file as a tip to improve their life. Bookmarklets can be another example. So, I
think we should remove this hack at least for copying (why
TextEncoding::*display*(String|Buffer) are used for copying? :-).

As for displaying, I still dislike this hack, but I can agree some websites may
expect 0x5c becomes a yen sign. However, I think this hack isn't working at all
for websites where charset=Shift_JIS maybe because we are testing the encoding
is "Shift_JIS_X0213-2000", not "Shift_JIS". I created two HTMLs

http://shinh.skr.jp/tmp/backslash_sjis.html (charset=Shift_JIS)
http://shinh.skr.jp/tmp/backslash_sjis0213.html (charset=Shift_JIS_X0213-2000)

On Mac, I don't see yen signs in the former HTML and I see yen signs in the
latter. On Windows and Linux, no conversions are done in both HTML, though I
see yen signs in both HTMLs on Windows because of the hacked font. So, it seems
this hack in question isn't working at all for websites whose charset is
Shift_JIS. I guess most websites which expect 0x5c is a yen sign would use

By the way, Alexey, it's very difficult to find a major website where this bug
is a big issue because major websites may

- prefer UTF8
- know the issues around 0x5c and use U+FF3C and U+FFE5 instead

but I find a bunch of complaints about this behavior in blogs and this bug
actually bit me more than ten times... Please note that some major SNS and blog
services in Japanese are using EUC-JP and the users of them cannot workaround
this issue as they cannot change the charset in the service.

Summary: I believe the best fix is just removing this hack, but if we want to
keep this hack, we may want to change the code so that

1. Backslashes aren't changed when a user copies a text.
2. Add "Shift_JIS" into the list of hacked encodings.
3. (optional) Only work on Mac (I guess Windows people have hacked fonts and
they don't need this hack, and Linux people can convert backslashes into yen
signs in their brain :).

By the way, is it easy to do the step 1 above?

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list