[webkit-dev] Content sniffing in WebCore
abarth at webkit.org
Thu Oct 23 12:33:19 PDT 2008
By the way, my colleague Juan has been doing an analysis of
CFNetwork's content sniffing algorithm and it looks like CFNetwork
doesn't have a heuristic for GIF. Based on our data, this is the
second most important heuristic. Safari should see a noticeable
compatibility gain from this convergence effort.
On Thu, Oct 9, 2008 at 10:38 AM, Adam Barth <abarth at webkit.org> wrote:
> Currently, every WebKit port has to implement its own content sniffing
> algorithm. This is problematic for compatibility and security. We
> should implement a content sniffing algorithm in WebCore so that it
> can be used by every port.
> A number of web servers don't properly set the Content-Type header
> when they serve responses. One common misconfiguration is to not send
> a Content-Type header at all or to send a bogus Content-Type header
> (i.e., with a value like "(null)" or "application/unknown"). To
> render these sites correctly, all browsers employ content sniffing
> algorithms that look at the contents of the response to determine the
> type of the resource.
> Some browsers have very aggressive content sniffing algorithms that
> often change the type of a resource. This can be dangerous if a web
> server allows users to upload content, such as images, and the browser
> treats these resources as HTML because this lets an attacker XSS the
> site. Designing a content sniffing algorithm is a careful balancing
> act between compatibility and security.
> WebKit itself does not contain a content sniffing algorithm, leaving
> each port to design their own. For example, Safari and Chromium each
> implement their own content sniffing algorithm and I imagine (although
> I haven't tested) that other ports do so as well. This causes
> unnecessary compatibility issues between different WebKit ports and
> leaves each port vulnerable to fend for itself in avoiding the
> security pitfalls.
> I think it makes sense for WebCore itself to implement one content
> sniffing algorithm that every port can use. One starting point for
> this common implementation is the Chromium content sniffer, which is
> open source. A number of Chromium contributors, myself included, have
> spent a lot of effort tuning that content sniffer to maximize
> compatibility while minimizing attack surface, and we'd like everyone
> to benefit from our efforts.
> We've also been working with the HTML 5 working group on standardizing
> content sniffing algorithms across all browsers. Eventually, I'd like
> to see WebKit's content sniffer converge with the HTML 5
> specification. This process will likely involve the WebKit content
> sniffer and the HTML 5 specification evolving over time towards
> I'm sending this email to the list to get buy-in from the rest of the
> WebKit community on the general direction of implementing a content
> sniffer. I'd also like specific feedback about which content sniffing
> heuristics you think are important to include. As a starting point
> for discussion, you can see the Chromium content sniffer here:
> The top of that file has some comments that explain some of the
> guiding design choices in the algorithm and a comparison with the
> behavior of some other browsers.
More information about the webkit-dev