[Webkit-unassigned] [Bug 184444] [GTK] webkit_web_view_load_html() garbages linked CSS content

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed Apr 18 01:25:43 PDT 2018


--- Comment #5 from Milan Crha <mcrha at redhat.com> ---
Created attachment 338201

  --> https://bugs.webkit.org/attachment.cgi?id=338201&action=review

trvial patch (just to show it)

(In reply to Michael Catanzaro from comment #4)
> WebKit doesn't try to guess your file encoding, ...

Maybe I did not use the 'encoding' word properly. If the bug #127481 is right, and it seems it is, then WebKit has some expectation about HTML and its CSS files "encodings", which is wrong, from my point of view. The above test proves it and it's all about HTML and its CSS sub-file. I'll try to rephrase, but I'm afraid it'll not help much.

The above wk2-css.c loads an HTML document which contains:
   <meta http-equiv="content-type" content="text/html; charset=utf-8">
   <link type="text/css" rel="stylesheet" href="file:///usr/.....css">
using webkit_web_view_load_html() function. The webview itself has also set utf-8 as its default encoding. Whether the .css file is loaded properly solely depends on the actual content of the HTML file, which is wrong from my point of view. When the HTML file contains non-ASCII letters, then the .css file is read as UTF-16 (thus it looks like a garbage), when the HTML file contains only ASCII letters, then the .css file is read as UTF-8 (or some other single-byte encoding, it doesn't matter, it's not that important which single-byte encoding it is, because it's only ASCII there).

I can mix ASCII HTML with UTF-16 CSS, the same as UTF-8/UTF-16 HTML with ASCII CSS, there should not be any issue with it. Furthermore, I believe most (if not all) UTF-16 files contain the Unicode marker (0xFEFF), thus it's easily detectable that the file is in UTF-16. When the marker is not there, then you can add a bit more heuristic there, but even then I'd expect the CSS is in the default encoding, if no other is passed by the caller.

> What does this have to do with the bug?

Everything and nothing. The loadData() can specify the encoding which the loadHTML/loadPlainText cannot. And you said you want to be explicit about encodings.

> It'd at least be good to see.

Here you are. It uses UTF-8 encoding, but it can use the default encoding from the WebKitSettings, it depends whether you'd want to extend the documentation for the two functions too. I do not think the patch is good for production, though, due to the reasons in comment #3.

You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20180418/33f047a3/attachment.html>

More information about the webkit-unassigned mailing list