[Webkit-unassigned] [Bug 156626] New: Would like a way to prevent user-controlled markup from breaking out of an element

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Apr 15 08:21:27 PDT 2016


https://bugs.webkit.org/show_bug.cgi?id=156626

            Bug ID: 156626
           Summary: Would like a way to prevent user-controlled markup
                    from breaking out of an element
    Classification: Unclassified
           Product: WebKit
           Version: Safari Technology Preview
          Hardware: Macintosh
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: WebCore Misc.
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: aroben at webkit.org

Created attachment 276475
  --> https://bugs.webkit.org/attachment.cgi?id=276475&action=review
screenshot of broken page layout due to user-controlled markup

github.com displays lots of user-controlled markup in comments on issues and pull requests. Users submit Markdown (which can include raw HTML) and we transform it to HTML and display it on our pages. Serving user-controlled markup poses a security risk, which we mitigate in two ways:

1. We sanitize the HTML that comes out of our Markdown parser. We only allow certain elements, attributes, and attribute values and strip out anything that doesn’t match our whitelist.
2. We use mechanisms like Content Security Policy to provide an extra layer of protection if any unwanted content slips through the sanitization.

But even with these mitigations it is still possible for users to unintentionally break our UI (and thus it may be possible for malicious users to use this for phishing etc.). One way for this to happen is via interactions between lists and tables in the HTML parser. A live example of this exists on https://github.com/mantisbt/mantisbt/pull/272. See the attached screenshot. (The rendering issues are less dramatic when you’re logged out, but still present.) I’ll explain what’s going on here in detail.

One of the comments on this page contains the following Markdown:

- echo '<div><table class="width75">', "\n",
+ echo '<div><table class="width-70">', "\n",

Clearly this is an excerpt of a diff, and if the user had enclosed it in triple-backticks (```) it would have been rendered as such. But since it is not escaped by backticks, it gets interpreted as an unordered list, generating the following HTML:

<ul>
<li>echo &#39;<div><table class="width75">&#39;, &quot;\n&quot;,</li>
<li>echo &#39;<div><table class="width-70">&#39;, &quot;\n&quot;,</li>
</ul>

This HTML is then sent through our HTML sanitizer. The sanitizer is based on an HTML-compliant parser (https://github.com/google/gumbo-parser) which parses this HTML as a document fragment, strips out elements, attributes, and attribute values that don’t match our whitelist, and then reserializes the document fragment to produce the following HTML:

<ul>
<li>echo '<div>', "\n",<li>echo '<div></div></li><table>
</table>', "\n",<table>
</table></div></li></ul>

Looking at the innerHTML view on the Live DOM Viewer shows the same result (other than the class attributes that the sanitizer removed): http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4062

This sanitized HTML is then inserted into our page’s markup. You can see the skeleton of our page’s markup here: http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4063

Inserting the sanitized HTML between the “user content start/end” comments gives this: http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4064

You can see that the “this should be inside the box” text has moved outside the box. The user-controlled markup has broken our page’s markup and layout.

We’ve talked about working around this by wrapping all user-controlled markup in <object> tags (http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4065), or possibly putting each comment inside <table><td> (http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4066). I don’t think there are many ways to break outside of those in the parser. But this solution (if it actually works) is obviously a bit of a hack.

It would be great if there were a way to tell the browser to keep the user-controlled markup from breaking out of its container element, both in terms of the DOM and visually.

<iframe seamless sandbox srcdoc> seemed like a solution for this, but seamless iframes have been removed from the spec. (This solution is extra appealing because of the browser-provided sandboxing; perhaps we could someday get rid of the sanitizing we do on our servers to remove scripts, etc.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20160415/2b81ace6/attachment.html>


More information about the webkit-unassigned mailing list