[Webkit-unassigned] [Bug 14608] Please add UTF-8 support to Japanese encoding auto-detection

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Jul 16 16:14:52 PDT 2007


------- Comment #7 from 808caaa4.8ce9.9cd6c799e9f6 at gmail.com  2007-07-16 16:14 PDT -------
// repost.
Sorry for delayed response.

Sites with UTF8/ja and broken tags mostly occur in end user sites,
I want not to bring pillory to them....

The most important reason for auto-detecting UTF8/ja support I think is
casual filter/Greasemonkey, for further maybe implements to WebKit.
It may strip out <meta>s and pads something at the top.
It's their risk at own...but supporting UTF8/ja is gentle, I think.

Additional consults.

While collecting examples, anonymous reporter(2ch.net, poster ID:xmYP4i2q0)
this URL in fun:



(Currently) this URL has the sort of 'broken tags:'

> <meta http-equiv="Content-Type" content="text/html; charset="utf-8">

With this case, detectJapaneseEncoding() seems to not to be called (in another
For not-collectly-paired \x22, checkForHeadCharset() lost sync for quote and
runs out whole the content absorbed with returns-false
(at 'if(ptr == pEnd) return false;' line 588).

Tag/content may not contain linefeeds with almost websites.
I think successfully aborting at scanning quote pair when linefeed occuered
is with reality.

Should I post this issue as new thread or wait?

My experimental code.
                        while (ptr != pEnd && *ptr != quoteMark)
                                                        if(*ptr=='\r' ||
                                                                // too long tag
content : may lost sync
                                                                // successfully
bail out
m_checkedForHeadCharset = true;
                                                                return true;

Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the webkit-unassigned mailing list