[Webkit-unassigned] [Bug 14608] New: Japanese encoding detection problem: KanjiCode::judge isn't called properly
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Fri Jul 13 10:48:59 PDT 2007
http://bugs.webkit.org/show_bug.cgi?id=14608
Summary: Japanese encoding detection problem: KanjiCode::judge
isn't called properly
Product: WebKit
Version: 522+ (nightly)
Platform: All
OS/Version: All
Status: UNCONFIRMED
Severity: Normal
Priority: P3
Component: Page Loading
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: 808caaa4.8ce9.9cd6c799e9f6 at gmail.com
hypothesis.
1. Many pages uses UTF8 but not included BOMs.
2. Some pages have not-meta-tags like <div> at the top, checkForHeadCharset()
ignored the charset specified in <meta>.
3. KanjiCode::judge is linked, but seems not be called, because
encoding().isJapanese() seems almost always return with false.
BTW, KanjiCode::judge cannot detect UTF8.
UTF8-ja texts almost always detected as Shift_JIS by KanjiCode::judge, so we
distinguish between SJIS and UTF8 after call KanjiCode::judge.
enum KanjiCode::Type judge_with_utf8_ja(const char* str,int size){
// UTF8 JA strings is detected as Shift_JIS at this time.
int r=KanjiCode::judge(str,size);
// SJIS is really SJIS?
if(r==KanjiCode::SJIS && size>3){
int r80DF=0;
int rE0FF=0;
for(int i=0;i<size-3;i++){
if(str[i]<0 && str[i+1]<0 && str[i+2]<0 && str[i+3]>0){
if(str[i]<-0x20) r80DF++; else rE0FF++;
}
}
// Almost, SJIS: rE0FF==0 UTF8: r80DF==0
if(rE0FF>r80DF) r=KanjiCode::UTF8;
}
return r;
}
--
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the webkit-unassigned
mailing list