[Webkit-unassigned] [Bug 16179] New: any attribute name start with a unicode which like #xx00(x could be any hex number[0-9a-f]) will cause HTMLTokenizer parse error.

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed Nov 28 15:12:03 PST 2007


http://bugs.webkit.org/show_bug.cgi?id=16179

           Summary: any attribute name start with a unicode which like
                    #xx00(x could be any hex number[0-9a-f]) will cause
                    HTMLTokenizer parse error.
           Product: WebKit
           Version: 525+ (Nightly build)
          Platform: PC
        OS/Version: Windows XP
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: Page Loading
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: johnnyding.webkit at gmail.com


in HTML spec, any attribute should be one of basic HTML data typwe: NAME
tokens, which  must begin with a letter ([A-Za-z]) and may be followed by any
number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons
(":"), and periods (".").

However in WebKit's HTMLTokenizer, it did not check the composed characters of
attribute name whether follow HTML spec or not, it just cut off the UChar's
high 8bits and assign the low 8 bits to attribute name buffer which is a char
buffer and is used to gather attribute name characters and generate final
attribute atomicstring name. (section:case AttributeName,
func:HTMLTokenizer::parseTag,  file:Webkit\html\HTMLTokenizer.cpp)
So if any attribute name start with a Unicode which like #xx00, then finally
the attribute name buffer will get data like  #00  #xx ..., which cause current
attribute name will be a empty atomicstring, then  section:case QuotedValue,
empty attribute name cause attribute name is same with attribute value which is
CDATA type and maybe contain some characters which are illegal in attribute
name , however the function Token::addAttribute will check the attribute name
must not contain '/'. if the attribute value is URL, then we got assert failed.

The following is a testcase.  some Chinese websites use one Chinese space
symbol +U3000 as space to separate attribute name/value group, then cause
WebKit got assert failed.

For fixing this problem, I think
1) we may change the temporary attribute name/value buffer cBuffer as UChar
buffer, of course, some other code need to be changed.
2) detect the illegal character and discard it.


-- 
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list