[Webkit-unassigned] [Bug 35284] New: The libxml WebKit used may create multiple CDATA sections for original single CDATA section, which may break some web apps

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Feb 22 21:43:07 PST 2010


https://bugs.webkit.org/show_bug.cgi?id=35284

           Summary: The libxml WebKit used may create multiple CDATA
                    sections for original single CDATA section, which may
                    break some web apps
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: PC
        OS/Version: Mac OS X 10.5
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: XML
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: jnd at chromium.org


When putting the attached test.xml on your http server, than use WebKit based
browsers like Safari or Chrome to visit it (for example
http://localhost/test.xml?123 add parameter to avoid the cache), you will see
the original single CDATA section will be parsed to two or three CDATA sections
in those browsers. Typing
javascript:alert(document.documentElement.childNodes.length) in address bar or
using inspector to see the multiple CDATA sections the browsers got.

IE && FF do not have this issue.

Some web apps may be broken by this issue. For example, the Discuz!, the most
popular forum platform in China, relies on the correct XML parsing to implement
some features. in discuz\include\js\common.js, the function "ajaxpost" reads
the CDATA section and puts the contents of CDATA section in the page. See the
following code.

function ajaxpost(formid, showid, waitid, showidclass, submitbtn, recall) {
    ...
    var handleResult = function() {
        var s = '';
        ...
        try {
            if(BROWSER.ie) {
                s = $(ajaxframeid).contentWindow.document.XMLDocument.text;
            } else {
                s =
$(ajaxframeid).contentWindow.document.documentElement.firstChild.nodeValue;
            }
        } catch(e) {
            ...
        }
        ...
    }
}

Bur now in WebKit, since libxml parsed out multiple CDATA sections instead of
single CDATA section like the original data, only part of contents are added in
page and the functionalities of the page are all broken. Almost millions of
discuz! based sites are affected. I personally think we should fix this issue.

After digging in the libxml source code, I found the problem was because the
parser of libxml created a small CDATA section (300 XMLChars, see the
definition of XML_PARSER_BUG_BUFFER) when it entered into a valid CDATA section
but wasn't able to find the valid end tag of CDATA section (which is "]]>").
(Please refer to libxml/parse.c, line: 10426, code: base =
xmlParseLookupSequence(ctxt, ']', ']'. '>'); )

The contents' length of the single CDATA section in the test.xml is 65737(all
US-ASCII characters). When debugging with Chromium,  the length of first part
data the WebKit sent to libxml was 3852, so the libxml created and push a CDATA
section 
which is 300 characters since it could NOT find the valid end tag of CDATA
section. Until the last part data came, so the libxml parser found the the
valid end tag "]]>", the rest contents of the original CDATA section are put
into another CDATA 
section. At last we got multiple CDATA sections instead of single CDATA section
like the original data. As long as the single CDATA section is too big to let
libxml one-time access the whold CDATA section, the issue occurs.

I don't know why libxml has this logic to handle CDATA section, after removing
the logic of creating a 300 characters when it entered into a valid CDATA
section but wasn't able to find the valid end tag of CDATA section, the bug is
gone.
I am not familiar with libxml, if any experts know the reason of the above
logic (why use it and whether it can be changed), please help on fixing this
issue. Otherwise, I gonna send a bug to xmlsoft for this issue.

Thanks!

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list