[Webkit-unassigned] [Bug 14608] Please add UTF-8 support to Japanese encoding auto-detection
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Jul 16 16:14:52 PDT 2007
http://bugs.webkit.org/show_bug.cgi?id=14608
------- Comment #7 from 808caaa4.8ce9.9cd6c799e9f6 at gmail.com 2007-07-16 16:14 PDT -------
// repost.
Sorry for delayed response.
Sites with UTF8/ja and broken tags mostly occur in end user sites,
I want not to bring pillory to them....
The most important reason for auto-detecting UTF8/ja support I think is
casual filter/Greasemonkey, for further maybe implements to WebKit.
It may strip out <meta>s and pads something at the top.
It's their risk at own...but supporting UTF8/ja is gentle, I think.
Additional consults.
While collecting examples, anonymous reporter(2ch.net, poster ID:xmYP4i2q0)
said
this URL in fun:
http://developer.apple.com/jp/
Kidding!
(Currently) this URL has the sort of 'broken tags:'
> <meta http-equiv="Content-Type" content="text/html; charset="utf-8">
With this case, detectJapaneseEncoding() seems to not to be called (in another
reason)....
For not-collectly-paired \x22, checkForHeadCharset() lost sync for quote and
runs out whole the content absorbed with returns-false
(at 'if(ptr == pEnd) return false;' line 588).
Tag/content may not contain linefeeds with almost websites.
I think successfully aborting at scanning quote pair when linefeed occuered
is with reality.
Should I post this issue as new thread or wait?
My experimental code.
-----
while (ptr != pEnd && *ptr != quoteMark)
{
if(*ptr=='\r' ||
*ptr=='\n'){
// too long tag
content : may lost sync
// successfully
bail out
m_checkedForHeadCharset = true;
return true;
}
++ptr;
}
-----
--
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the webkit-unassigned
mailing list