[Webkit-unassigned] [Bug 35831] New: WebCore PreloadScanner Entity Detection Bug - Non-HTML Entities are being treated as entities
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Sat Mar 6 12:46:02 PST 2010
https://bugs.webkit.org/show_bug.cgi?id=35831
Summary: WebCore PreloadScanner Entity Detection Bug - Non-HTML
Entities are being treated as entities
Product: WebKit
Version: 528+ (Nightly build)
Platform: Macintosh Intel
URL: http://www.vistaprint.com/gallery.aspx
OS/Version: Mac OS X 10.6
Status: UNCONFIRMED
Severity: Normal
Priority: P2
Component: Page Loading
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: mirthy at gmail.com
The entity detector in WebCore's PreloadScanner is broken.
The HTML tokenizer used will accept things that look like entities but aren't
and convert them into Unicode characters.
For example, in scanning the HTML to pull out IMG tags, we might have a case
like this:
<img src="http://www.webkit.org/getImage.aspx?id=12345&lang_id=1"/>
The tokenizer spots &lang_id=1 and thinks it might be an entity (it isn't!),
but the test for entities isn't correct in the PreloadScanner (as it is in
HTMLTokenizer).
Code area:
http://trac.webkit.org/browser/trunk/WebCore/html/PreloadScanner.cpp#L257
The actual problematic line:
http://trac.webkit.org/browser/trunk/WebCore/html/PreloadScanner.cpp#L268
The loop actually halts and the text is check for entities when a
non-alphanumeric character is reached. It should really only be checking when
a semicolon is reached.
This causes query strings to get truncated and replaced with a unicode <
symbol. The mangled URL is then passed back to the preloader looking like:
<img src="http://www.webkit.org/getImage.aspx?id=12345<_id=1"/>
The preloader then tries to fetch it with an invalid URL (which will most
likely 404).
Other examples where this might be problematic:
&_energy=100
<-now=10
Basically, any query string variable name that starts like a HTML entity name
and has a non-alphanumeric separator.
Proposed fix would to just remove the alphanumeric check. The semicolon check
above should be sufficient, if there are cases of bad entities that that are
too long or don't contain a semicolon, then leave them be.
Build Info:
SVN Rev: 55620
Regular WebKit on Mac OS X
XCode 3.2.1
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list