[webkit-reviews] review denied: [Bug 16482] Hook up ICU's encoding detector and add a boolean param to Settings and WebPreferences : [Attachment 27576] updated patch

Fri Feb 27 05:43:47 PST 2009

Alexey Proskuryakov <ap at webkit.org> has denied Jungshik Shin
<jshin at chromium.org>'s request for review:
Bug 16482: Hook up ICU's encoding detector and add a boolean param to Settings
and WebPreferences
https://bugs.webkit.org/show_bug.cgi?id=16482

Attachment 27576: updated patch 
https://bugs.webkit.org/attachment.cgi?id=27576&action=review

------- Additional Comments from Alexey Proskuryakov <ap at webkit.org>
+	  - add TextEncodingDetection* to platform/text.

This class should be named TextEncodingDetector, not TextEncodingDetection.

+	    Tiger comes with ICU 3.2 with encoding detector.

"which doesn't support encoding detection"

+	 * WebCore.vcproj/WebCore.vcproj:
+	 * WebCore.xcodeproj/project.pbxproj:

Having per-file and per-function comments makes it easier for reviewers and
others who will read a patch to understand it. This is particularly important
for interesting patches like this one.

Please also edit makefiles for other build systems (GNUmakefile.am,
WebCore.pro, WebCore.scons, WebCoreSources.bkl).

+    , m_hintDecoder(NULL)

We use 0, not NULL in C++ code. There are several uses of NULL in the patch,
please fix them all.

+    return m_usesEncodingDetector
+	 && (m_source == DefaultEncoding
+	 || (m_source == EncodingFromParentFrame && m_hintDecoder
+	 && m_hintDecoder->source() == AutoDetectedEncoding)); 

This code is very difficult to read due to indenting. There is no need to make
lines that short, and I think that this condition can be made much easier to
read if you don't wrap it at 72 characters).

     // Do the auto-detect if our default encoding is one of the Japanese ones.

     // FIXME: It seems wrong to change our encoding downstream after we have
already done some decoding.
-    if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding &&
encoding().isJapanese())
+    if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding &&
encoding().isJapanese()) {
	 detectJapaneseEncoding(data, len);
+    } else if (shouldAutoDetect()) {
+	TextEncoding detectedEncoding;
+	if (detectTextEncoding(data, len, hintEncodingName(),
&detectedEncoding))
+	    setEncoding(detectedEncoding, AutoDetectedEncoding);
+   }

Our coding style says that there should be no braces around single-line blocks.
The comment before this code no longer matches what it does (it is no longer
only about Japanese).

Indenting on the last line here is incorrect.

I think that we should resolve this FIXME before making encoding detection more
common. It should be a separate bug blocking this one.

 String TextResourceDecoder::flush()
 {
+   // For HTML and XML document, if we can not find proper encoding even
+   // document is completely loaded, we need to use automatically detect
+   // the encoding of document if other conditions for autodetection is
+   // satisfied.

This comment should explain why HTML and XML are special. What about CSS, for
example?

+    void setHintDecoder(const TextResourceDecoder* hintDecoder) {
+	 m_hintDecoder = hintDecoder;
+    }

The opening brace should go on the next line.

+    EncodingSource source() const { return m_source; }

I don't see where this new public method us used outside TextResourceDecoder.
Let's not add it if it is not needed.

+    const char* hintEncodingName() const {
+	 return m_hintDecoder ? m_hintDecoder->encoding().name() : NULL;
+    }

Same comments about opening brace and NULL.

+    const TextResourceDecoder* m_hintDecoder;
+    //RefPtr<TextResourceDecoder> m_hintDecoder;

We don't commit commented out code. How are you going to manage hint decoder
lifetime?

+bool detectTextEncoding(const char* data, size_t len,
+			 const char* hintEncodingName,
+			 TextEncoding* detectedEncoding)
+{
+    *detectedEncoding = TextEncoding();
+#ifdef BUILDING_ON_TIGER
+    // Tiger came with ICU 3.2 and does not have the encoding detector.
+    return false;
+#else

This will cause unused argument warnings when built on Tiger, please add
UNUSED_PARAM macros for unused arguments.

Index: WebCore/platform/text/TextEncodingDetectionNone.cpp
===================================================================

--- WebCore/platform/text/TextEncodingDetectionNone.cpp (revision 0)
+++ WebCore/platform/text/TextEncodingDetectionNone.cpp (revision 0)
@@ -0,0 +1,11 @@
+namespace WebCore {
+
+bool detectTextEncoding(const char* data, size_t len,
+			 const char* hintEncodingName,
+			 TextEncoding* detectedEncoding)
+{
+    *detectedEncoding = TextEncoding();
+    return false;
+}
+
+}

This file needs copyright comments and includes. Same issue with unused
arguments.

	 Settings* settings = m_frame->settings();
-	 m_decoder = TextResourceDecoder::create(m_responseMIMEType, settings ?
settings->defaultTextEncodingName() : String());
+	 if (settings) {

You can write this in one line as 
	 if (Setting* settings = m_frame->settings()) {

+	     if (parentFrame && parentFrame->document())
+		 m_decoder->setHintDecoder(parentFrame->document()->decoder());


A different origin parent frame shouldn't affect child frame decoding for
security reasons, even through a hint. So, I'm not sure why the hint is needed
at all. Am I perhaps missing something?

+	} else {
+	     m_decoder = TextResourceDecoder::create(m_responseMIMEType,
String());
+	 }

There is a tab on "else" line. We don't use braces around single-line blocks.

This looks good for me for the most part. The biggest issues I see are lack of
clarity about hint encoding usage (and its decoder lifetime), and the FIXME I
asked to fix in a blocking bug.