[Webkit-unassigned] [Bug 14608] Please add UTF-8 support to Japanese encoding auto-detection

Mon Jul 16 16:14:52 PDT 2007

http://bugs.webkit.org/show_bug.cgi?id=14608

------- Comment #7 from 808caaa4.8ce9.9cd6c799e9f6 at gmail.com  2007-07-16 16:14 PDT -------
// repost.
Sorry for delayed response.

Sites with UTF8/ja and broken tags mostly occur in end user sites,
I want not to bring pillory to them....

The most important reason for auto-detecting UTF8/ja support I think is
casual filter/Greasemonkey, for further maybe implements to WebKit.
It may strip out <meta>s and pads something at the top.
It's their risk at own...but supporting UTF8/ja is gentle, I think.

Additional consults.

While collecting examples, anonymous reporter(2ch.net, poster ID:xmYP4i2q0)
said 
this URL in fun:

http://developer.apple.com/jp/

Kidding!

(Currently) this URL has the sort of 'broken tags:'

> <meta http-equiv="Content-Type" content="text/html; charset="utf-8">

With this case, detectJapaneseEncoding() seems to not to be called (in another
reason)....
For not-collectly-paired \x22, checkForHeadCharset() lost sync for quote and
runs out whole the content absorbed with returns-false
(at 'if(ptr == pEnd) return false;' line 588).

Tag/content may not contain linefeeds with almost websites.
I think successfully aborting at scanning quote pair when linefeed occuered
is with reality.

Should I post this issue as new thread or wait?

My experimental code.
-----
                        while (ptr != pEnd && *ptr != quoteMark)
                                                {
                                                        if(*ptr=='\r' ||
*ptr=='\n'){
                                                                // too long tag
content : may lost sync
                                                                // successfully
bail out

m_checkedForHeadCharset = true;
                                                                return true;
                                                        }
                            ++ptr;
                                                }
-----

-- 
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.