[Webkit-unassigned] [Bug 37765] REGRESSION(57531): the commit-queue still hates Tor Arne Vestbø

Sun Apr 18 11:36:30 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=37765

--- Comment #18 from Chris Jerdonek <cjerdonek at webkit.org>  2010-04-18 11:36:29 PST ---
+++ b/WebKitTools/ChangeLog
@@ -1,3 +1,93 @@
+        We do not have to use u"" instead of "" because u"a" == "a" as
expected
+        in Python.  Python will generate a warning to the console in cases
where
+        a unicode() == str() operation cannot be performed.

Can you clarify what you are trying to say here?  It sounds like you might be
saying more than you mean to say -- e.g. that the caller never needs to use
unicode string literals, which is obviously not right.

+        All places which use StringIO need to be sure to pass StringIO a
+        pre-encoded byte-array (str object) instead of unicode so that
+        clients which read from the StringIO don't have encoding exceptions.

It seems like we shouldn't need to use StringIO as much as we do (except
perhaps in unittest code to make certain tests easier).  It looks like you're
removing a lot of the places it's being used which seems good.

@@ -117,7 +118,9 @@ class ChangeLog(object):

     def latest_entry(self):
         # ChangeLog files are always UTF-8, we read them in as such to support
Reviewers with unicode in their names.
-        changelog_file = codecs.open(self.path, "r", "utf-8")
+        # We don't use codecs.open here to make the api for
parse_latest_entry_from_file clearer.
+        # If we did, then it would be unclear as to whos reponsibility
decoding of the file should be.
+        changelog_file = open(self.path, "r")
         try:
             return self.parse_latest_entry_from_file(changelog_file)

I understand what you're saying, but maybe the conclusion that should be drawn
is that we shouldn't have parse_latest_entry_from_file() as part of our API? 
In other words, the caller should be responsible.  That would also be more
consistent with the rule of thumb to decode as early as possible.

+++ b/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py
@@ -45,9 +45,10 @@ class tee:
     def __init__(self, *files):
         self.files = files

-    def write(self, string):
+    # Callers should pass an already encoded string for writing.
+    def write(self, bytes):
         for file in self.files:
-            file.write(string)
+            file.write(bytes)

Doesn't this also go against "unicode everywhere/encode late"?  Maybe this
would be a good spot to do type-checking between unicode/str for backwards
compatibility.

+++
b/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/metered_stream.py
@@ -56,6 +56,8 @@ class MeteredStream:
         self._stream = stream
         self._last_update = ""

+    # FIXME: Does this take a string (unicode) or an array of bytes (str)?
+    # If it takes a string, it needs to call txt.encode("utf-8")
     def write(self, txt):
         """Write text directly to the stream, overwriting and resetting the
         meter."""

I'm a bit unclear on the extent to which, for example, we'll need to be passing
unicode strings even for every log message we create.  You've probably thought
about this more than I have.

Out of curiosity, I started looking to see how Python's logging package handles
this.  I stopped when it started to look like the logging package has no
preference between str and unicode.  I think it might only be the particular
logging handlers that care, which is what the caller has control over (when
they configure logging).

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.