[Webkit-unassigned] [Bug 47397] TextResourceDecoder::checkForHeadCharset can look way past the limit.

Fri Oct 8 16:25:25 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=47397

--- Comment #11 from Adam Barth <abarth at webkit.org>  2010-10-08 16:25:24 PST ---
> Interesting. HTMLTokenizer works on decoded stream and it's not clear how to do parsing before decoding, which is what we need to do here. Assuming some random encoding first (UTF8?) could perhaps lead to subtle bugs when decoded text could be interpreted incorrectly.

If the width of the characters are ok, it should be fine.  By design, we don't support encodings that tokenizer differently than UTF8 (e.g., UTF7).

The bigger problem, I'd expect, is that HTMLTokenizer expects a 16bit character but we might have UTF8 input.  You might be able to finesse that because the HTMLTokenizer only looks at one character at a time.  You could just zero-pad the 8bit characters into 16bit characters.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.