[webkit-dev] innerStaticHTML

Tue Nov 24 23:46:48 PST 2009

>>> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020191.html
> I think we should experiment with the minimal API that seems useful.
> If the experiment is a success, we can scale it up.

Apologies if I am rehashing something discussed earlier, but I think it
would be easy to run into some subtle problems with an API such as
.safeInnerHTML API, when mixed with .innerHTML.

The approach where input is sanitized upon assignment (and the
original is not stored anywhere) is possibly the best, but may still
lead to trouble. One possibility is the behavior of innerHTML on <xmp>
or <textarea> elements, which seems inconsistent and potentially dangerous
across browsers; and in general, innerHTML manipulation following
safeInnerHTML assignments seem like an easy way to accidentally mess
things up on web application side.

Deferred sanitization triggered by safeInnerHTML assignments is
another possibility, but it creates a whole lot other issues, e.g.:

foo.safeInnerHTML =  '<a href="' + user_string + '">...</a>'
foo.innerHTML += '<br>Ta-dah!';

...or:

foo.safeInnerHTML =  '<a href="' + user_string + '">...</a>'
bar.innerHTML = foo.safeInnerHTML;

Tainting elements that had their contents accessed via .safeInnerHTML,
and then tracking and propagating this data, is one way to avoid
problems - but it introduces significant complexity and would be
pretty opaque to developers; especially when dealing with .innerHTML
on outer or nested elements.

I think the syntax outlined in Adam's post is much safer, would
definitely require far fewer implementation-level challenges, and
would introduce fewer gotchas for web app developers by tying
sanitization to a well-defined output container, rather than one of
several content access methods.

More importantly, it is also easily extensible to a solution that
could be utilized by non-JS pages, as its logical 1:1 equivalent would
be a nonced, locked element, say:

<span secure_mode="$random_server_generated_nonce">
...unsanitized user content...
</span secure_mode="$random_server_generated_nonce">

This is not a fortunate syntax, but illustrates the idea reasonably
well. The boundaries are guarded, so the approach is safe as long as
the server can produce decent random boundaries, and the benefit is
that server-generated pages can benefit from lightweight sanitization
without the need to do all content rendering dynamically with
client-side JS - which would make a huge difference.

/mz