[Webkit-unassigned] [Bug 37765] REGRESSION(57531): the commit-queue still hates Tor Arne Vestbø
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Sun Apr 18 10:41:20 PDT 2010
https://bugs.webkit.org/show_bug.cgi?id=37765
--- Comment #15 from Chris Jerdonek <cjerdonek at webkit.org> 2010-04-18 10:41:18 PST ---
Thanks for taking this on and looking into this, Eric. (Good slide show in
comment 3, btw.)
A couple random comments before a couple comments on the patch:
(1) As the slides suggest, we may want to consider decoding using "utf-8-sig"
as a general practice instead of "utf-8":
"To increase the reliability with which a UTF-8 encoding can be detected,
Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig")....
Before any of the Unicode characters is written to the file, a UTF-8 encoded
BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written.
On decoding utf-8-sig will skip those three bytes if they appear as the first
three bytes in the file."
(from http://docs.python.org/library/codecs.html#encodings-and-unicode )
(2) I came across this which might be useful somewhere (intelligent encoding
auto-detection):
http://chardet.feedparser.org/
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list