[webkit-reviews] review denied: [Bug 25487] windows-949 returned by document.{charset, characterset} is not recognized by Korean advertising servers : [Attachment 29952] updated patch (added displayName() to TextEncoding class)

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat May 2 00:44:47 PDT 2009


Alexey Proskuryakov <ap at webkit.org> has denied Jungshik Shin
<jshin at chromium.org>'s request for review:
Bug 25487: windows-949 returned by document.{charset,characterset}  is not
recognized by Korean advertising servers
https://bugs.webkit.org/show_bug.cgi?id=25487

Attachment 29952: updated patch (added displayName() to TextEncoding class)
https://bugs.webkit.org/attachment.cgi?id=29952&action=review

------- Additional Comments from Alexey Proskuryakov <ap at webkit.org>
+	 * dom/Document.cpp:
+	 (WebCore::Document::encoding):
+	 * platform/text/TextEncoding.cpp:
+	 (WebCore::TextEncoding::displayName):
+	 * platform/text/TextEncoding.h:

It is best to describe individual changes to functions in ChangeLog - this is
why prepare-ChangeLog generates this list.

+const char* TextEncoding::displayName() const {

The brace should go to next line.

+    // We treat EUC-KR as windows-949 (its superset), but need to expose 
+    // 'EUC-KR' because 'windows-949' is not recognized by most Korean
+    // web servers. 

This comment is somewhat misleading, because most Korean Web servers do not
care about how the engine calls this encoding internally, and have no way of
knowing that. It is only a certain (quite weird) technique of serving
JavaScript ads that is affected.

+    static const char* const a =
atomicCanonicalTextEncodingName("windows-949");

The TextEncoding class is used from multiple threads, so using static data
without any protection or explicit initialization is dangerous. This method is
not currently called from secondary threads, and this problem is already
present in other methods of this class, so I'm not requesting that you fix it
now - but we should do that some day.

+    return m_name != a ? m_name : "EUC-KR";

A "==" check or an if with early return would be slightly easier to read.

+	 const char* displayName() const;

This function name is a bit misleading, it sounds like a name used for
rendering. Maybe domName() would be better? If you prefer displayName(), please
add a short comment in the header.

+2009-05-01  Jungshik Shin  <jshin at chromium.org>
+
+	 Reviewed by NOBODY (OOPS!).
+
+	 http://bugs.webkit.org/show_bug.cgi?id=25487
+
+	 For euc-kr and other 8bit Korean encodings 
+	 (similar to euc-kr/windows-949), make document.charset return
+	 EUC-KR instead of windows-949. The latter is not recognized by
+	 Korean web servers.

No need to copy a description of WebCore changes to LayoutTests ChangeLog.

+  var charset = document.characterSet;
+  if (!charset)
+    charset = document.charset;
+  if (!charset)
+    charset = document.inputEncoding;
+  document.write("Encoding: " + charset + " (should be EUC-KR)");

Perhaps we should check all of these properties for having a correct value, not
just the first one that happens to be non-undefined?

All of the above are nitpicks, but I have one comment that may be substantial.
This patch makes WebKit behavior weird and incompatible in the following case:

<meta charset="x-windows-949">
<script>alert(document.characterSet)</script>

Firefox alerts "x-windows-949", WebKit ToT alerts "windows-949", but with this
patch, it will be "EUC-KR", which I think is a regression.

There are three options that I can see:
1) Ignore this problem for now, as it's unlikely that there are any servers
using x-windows-949 encoding name.
2) Make the document remember the original name for its encoding.
3) Make TextEncoding do that.

If the servers in question supported "w-windows-949", we could also consider
making that the canonical name, but unfortunately, adx.qubi.com does not
recognize it.

r- for now, as even if you prefer option 1, there is enough nitpicks to justify
another quick review round.


More information about the webkit-reviews mailing list