[Webkit-unassigned] [Bug 37765] REGRESSION(57531): the commit-queue still hates Tor Arne Vestbø

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sun Apr 18 10:41:20 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=37765





--- Comment #15 from Chris Jerdonek <cjerdonek at webkit.org>  2010-04-18 10:41:18 PST ---
Thanks for taking this on and looking into this, Eric.  (Good slide show in
comment 3, btw.)

A couple random comments before a couple comments on the patch:

(1) As the slides suggest, we may want to consider decoding using "utf-8-sig"
as a general practice instead of "utf-8":

"To increase the reliability with which a UTF-8 encoding can be detected,
Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig")....
Before any of the Unicode characters is written to the file, a UTF-8 encoded
BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written. 
On decoding utf-8-sig will skip those three bytes if they appear as the first
three bytes in the file."

(from http://docs.python.org/library/codecs.html#encodings-and-unicode )

(2) I came across this which might be useful somewhere (intelligent encoding
auto-detection):

http://chardet.feedparser.org/

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list