[Webkit-unassigned] [Bug 156626] New: Would like a way to prevent user-controlled markup from breaking out of an element
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Fri Apr 15 08:21:27 PDT 2016
Bug ID: 156626
Summary: Would like a way to prevent user-controlled markup
from breaking out of an element
Version: Safari Technology Preview
Component: WebCore Misc.
Assignee: webkit-unassigned at lists.webkit.org
Reporter: aroben at webkit.org
Created attachment 276475
screenshot of broken page layout due to user-controlled markup
github.com displays lots of user-controlled markup in comments on issues and pull requests. Users submit Markdown (which can include raw HTML) and we transform it to HTML and display it on our pages. Serving user-controlled markup poses a security risk, which we mitigate in two ways:
1. We sanitize the HTML that comes out of our Markdown parser. We only allow certain elements, attributes, and attribute values and strip out anything that doesnât match our whitelist.
2. We use mechanisms like Content Security Policy to provide an extra layer of protection if any unwanted content slips through the sanitization.
But even with these mitigations it is still possible for users to unintentionally break our UI (and thus it may be possible for malicious users to use this for phishing etc.). One way for this to happen is via interactions between lists and tables in the HTML parser. A live example of this exists on https://github.com/mantisbt/mantisbt/pull/272. See the attached screenshot. (The rendering issues are less dramatic when youâre logged out, but still present.) Iâll explain whatâs going on here in detail.
One of the comments on this page contains the following Markdown:
- echo '<div><table class="width75">', "\n",
+ echo '<div><table class="width-70">', "\n",
Clearly this is an excerpt of a diff, and if the user had enclosed it in triple-backticks (```) it would have been rendered as such. But since it is not escaped by backticks, it gets interpreted as an unordered list, generating the following HTML:
<li>echo '<div><table class="width75">', "\n",</li>
<li>echo '<div><table class="width-70">', "\n",</li>
This HTML is then sent through our HTML sanitizer. The sanitizer is based on an HTML-compliant parser (https://github.com/google/gumbo-parser) which parses this HTML as a document fragment, strips out elements, attributes, and attribute values that donât match our whitelist, and then reserializes the document fragment to produce the following HTML:
<li>echo '<div>', "\n",<li>echo '<div></div></li><table>
Looking at the innerHTML view on the Live DOM Viewer shows the same result (other than the class attributes that the sanitizer removed): http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4062
This sanitized HTML is then inserted into our pageâs markup. You can see the skeleton of our pageâs markup here: http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4063
Inserting the sanitized HTML between the âuser content start/endâ comments gives this: http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4064
You can see that the âthis should be inside the boxâ text has moved outside the box. The user-controlled markup has broken our pageâs markup and layout.
Weâve talked about working around this by wrapping all user-controlled markup in <object> tags (http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4065), or possibly putting each comment inside <table><td> (http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4066). I donât think there are many ways to break outside of those in the parser. But this solution (if it actually works) is obviously a bit of a hack.
It would be great if there were a way to tell the browser to keep the user-controlled markup from breaking out of its container element, both in terms of the DOM and visually.
<iframe seamless sandbox srcdoc> seemed like a solution for this, but seamless iframes have been removed from the spec. (This solution is extra appealing because of the browser-provided sandboxing; perhaps we could someday get rid of the sanitizing we do on our servers to remove scripts, etc.)
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-unassigned