[Webkit-unassigned] [Bug 24906] 0x5C of EUC-JP is not Yen Sign but U+005C

Tue Mar 9 14:09:19 PST 2010

https://bugs.webkit.org/show_bug.cgi?id=24906

--- Comment #45 from Alexey Proskuryakov <ap at webkit.org>  2010-03-09 14:09:19 PST ---
+++ b/LayoutTests/editing/pasteboard/copy-backslash-from-euc.html

Can this test in particular be dumpAsText()?

It also seems useful to test copying unstyled text (i.e. copying from a form
input).

+Copied text must be a backslash: <input id="result">

I think that this could use a better worded instruction for running the test
manually. One doesn't know that the input contains copied text, so the text
could be just "Should be a backslash: ". Just like you have in other tests.

Do these tests pass in IE?

     // The text encodings below treat backslash as a currency symbol.
     // See http://blogs.msdn.com/michkap/archive/2005/09/17/469941.aspx for
more information.
-    static const char* const a =
atomicCanonicalTextEncodingName("Shift_JIS_X0213-2000");
-    static const char* const b = atomicCanonicalTextEncodingName("EUC-JP");
-    return (m_name == a || m_name == b) ? 0x00A5 : '\\';
+    DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ());

The comment above the changed code should say "yen" specifically. I think there
is (was?) a similar problem with Korean encodings, and it's not handled here.

+    return m_name && set.contains(m_name) ? 0x00A5 : '\\';

I'd have added parentheses around the condition. I never remember precedence
rules, and that makes me write unambiguous code.

+    UChar unicodeNameMSPGothic[] = {0xFF2D, 0xFF33, 0x0020, 0xFF30, 0x30B4,
0x30B7, 0x30C3, 0x30AF, 0};
+    m_converterTypes.add(AtomicString(unicodeNameMSPGothic),
BackslashToYenSign);

It doesn't matter much in practice, but you could as easily avoid using
null-terminated strings:

    UChar unicodeNameMSPGothic[] = {0xFF2D, 0xFF33, 0x0020, 0xFF30, 0x30B4,
0x30B7, 0x30C3, 0x30AF, 0};
    m_converterTypes.add(AtomicString(unicodeNameMSPGothic,
sizeof(unicodeNameMSPGothic) / sizeof(UChar)), BackslashToYenSign);

+    // FIXME: We should not do this when a user explicitly specifies a
+    //        non-hacked font.

No need to wrap the FIXME. But even in cases where comments are long enough to
wrap, we don't try to indent.

How bad is this issue in practice? I'm not sure if I can make a test case just
from this description.

+        static const UChar yenSign = 0x00A5;

This should go to CharacterNames.h.

+    switch (converterType(fontFamily, encoding)) {
+    case BackslashToYenSign: {
<...>
+    case NoConvertion:
+    default:
+        ASSERT_NOT_REACHED();
+    }

Using a switch for two cases, on of which can not be taken, is somewhat
unusual. Is there a reason you did that, and not just ASSERT+if()?

+    static FontTranscoder* transcoder = new FontTranscoder();

You should probably use DEFINE_LOCAL_STATIC here.

+#include "AtomicString.h"
+#include "AtomicStringHash.h"
+#include "PlatformString.h"

Not only there is no need to include PlatformString.h, but there is no need to
include AtomicString.h.

+    friend FontTranscoder* fontTranscoder();

We usually put friend declarations in private section, for what it's worth.

It's great to see transcoding finally move forward!

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.