Content sniffing in WebCore

9 Oct 2008

      Currently, every WebKit port has to implement its own content sniffing
algorithm.  This is problematic for compatibility and security.  We
should implement a content sniffing algorithm in WebCore so that it
can be used by every port.

Background

A number of web servers don't properly set the Content-Type header
when they serve responses.  One common misconfiguration is to not send
a Content-Type header at all or to send a bogus Content-Type header
(i.e., with a value like "(null)" or "application/unknown").  To
render these sites correctly, all browsers employ content sniffing
algorithms that look at the contents of the response to determine the
type of the resource.

Some browsers have very aggressive content sniffing algorithms that
often change the type of a resource.  This can be dangerous if a web
server allows users to upload content, such as images, and the browser
treats these resources as HTML because this lets an attacker XSS the
site.  Designing a content sniffing algorithm is a careful balancing
act between compatibility and security.

WebKit

WebKit itself does not contain a content sniffing algorithm, leaving
each port to design their own.  For example, Safari and Chromium each
implement their own content sniffing algorithm and I imagine (although
I haven't tested) that other ports do so as well.  This causes
unnecessary compatibility issues between different WebKit ports and
leaves each port vulnerable to fend for itself in avoiding the
security pitfalls.

I think it makes sense for WebCore itself to implement one content
sniffing algorithm that every port can use.  One starting point for
this common implementation is the Chromium content sniffer, which is
open source.  A number of Chromium contributors, myself included, have
spent a lot of effort tuning that content sniffer to maximize
compatibility while minimizing attack surface, and we'd like everyone
to benefit from our efforts.

Standardization

We've also been working with the HTML 5 working group on standardizing
content sniffing algorithms across all browsers.  Eventually, I'd like
to see WebKit's content sniffer converge with the HTML 5
specification.  This process will likely involve the WebKit content
sniffer and the HTML 5 specification evolving over time towards
convergence.

Feedback

I'm sending this email to the list to get buy-in from the rest of the
WebKit community on the general direction of implementing a content
sniffer.  I'd also like specific feedback about which content sniffing
heuristics you think are important to include.  As a starting point
for discussion, you can see the Chromium content sniffer here:

http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?vie...

The top of that file has some comments that explain some of the
guiding design choices in the algorithm and a comparison with the
behavior of some other browsers.

Adam

Adam Barth

Benjamin Meyer

Allan Sandfeld Jensen

Adam Barth

tags

participants (3)