[Webkit-unassigned] [Bug 14608] New: Japanese encoding detection problem: KanjiCode::judge isn't called properly

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Jul 13 10:48:59 PDT 2007


           Summary: Japanese encoding detection problem: KanjiCode::judge
                    isn't called properly
           Product: WebKit
           Version: 522+ (nightly)
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P3
         Component: Page Loading
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: 808caaa4.8ce9.9cd6c799e9f6 at gmail.com


1. Many pages uses UTF8 but not included BOMs.
2. Some pages have not-meta-tags like <div> at the top, checkForHeadCharset()
ignored the charset specified in <meta>.
3. KanjiCode::judge is linked, but seems not be called, because
encoding().isJapanese() seems almost always return with false.

BTW, KanjiCode::judge cannot detect UTF8.
UTF8-ja texts almost always detected as Shift_JIS by KanjiCode::judge, so we
distinguish between SJIS and UTF8 after call KanjiCode::judge.

enum KanjiCode::Type judge_with_utf8_ja(const char* str,int size){
        // UTF8 JA strings is detected as Shift_JIS at this time.
        int r=KanjiCode::judge(str,size);
        // SJIS is really SJIS? 
        if(r==KanjiCode::SJIS && size>3){
                int r80DF=0;
                int rE0FF=0;
                for(int i=0;i<size-3;i++){
                        if(str[i]<0 && str[i+1]<0 && str[i+2]<0 && str[i+3]>0){
                                if(str[i]<-0x20) r80DF++; else rE0FF++;
                // Almost, SJIS: rE0FF==0 UTF8: r80DF==0
                if(rE0FF>r80DF) r=KanjiCode::UTF8;
        return r;

Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the webkit-unassigned mailing list