[webkit-dev] Content sniffing in WebCore

Adam Barth abarth at webkit.org
Thu Oct 23 12:33:19 PDT 2008


By the way, my colleague Juan has been doing an analysis of
CFNetwork's content sniffing algorithm and it looks like CFNetwork
doesn't have a heuristic for GIF.  Based on our data, this is the
second most important heuristic.  Safari should see a noticeable
compatibility gain from this convergence effort.

Adam


On Thu, Oct 9, 2008 at 10:38 AM, Adam Barth <abarth at webkit.org> wrote:
> Currently, every WebKit port has to implement its own content sniffing
> algorithm.  This is problematic for compatibility and security.  We
> should implement a content sniffing algorithm in WebCore so that it
> can be used by every port.
>
> Background
>
> A number of web servers don't properly set the Content-Type header
> when they serve responses.  One common misconfiguration is to not send
> a Content-Type header at all or to send a bogus Content-Type header
> (i.e., with a value like "(null)" or "application/unknown").  To
> render these sites correctly, all browsers employ content sniffing
> algorithms that look at the contents of the response to determine the
> type of the resource.
>
> Some browsers have very aggressive content sniffing algorithms that
> often change the type of a resource.  This can be dangerous if a web
> server allows users to upload content, such as images, and the browser
> treats these resources as HTML because this lets an attacker XSS the
> site.  Designing a content sniffing algorithm is a careful balancing
> act between compatibility and security.
>
> WebKit
>
> WebKit itself does not contain a content sniffing algorithm, leaving
> each port to design their own.  For example, Safari and Chromium each
> implement their own content sniffing algorithm and I imagine (although
> I haven't tested) that other ports do so as well.  This causes
> unnecessary compatibility issues between different WebKit ports and
> leaves each port vulnerable to fend for itself in avoiding the
> security pitfalls.
>
> I think it makes sense for WebCore itself to implement one content
> sniffing algorithm that every port can use.  One starting point for
> this common implementation is the Chromium content sniffer, which is
> open source.  A number of Chromium contributors, myself included, have
> spent a lot of effort tuning that content sniffer to maximize
> compatibility while minimizing attack surface, and we'd like everyone
> to benefit from our efforts.
>
> Standardization
>
> We've also been working with the HTML 5 working group on standardizing
> content sniffing algorithms across all browsers.  Eventually, I'd like
> to see WebKit's content sniffer converge with the HTML 5
> specification.  This process will likely involve the WebKit content
> sniffer and the HTML 5 specification evolving over time towards
> convergence.
>
> Feedback
>
> I'm sending this email to the list to get buy-in from the rest of the
> WebKit community on the general direction of implementing a content
> sniffer.  I'd also like specific feedback about which content sniffing
> heuristics you think are important to include.  As a starting point
> for discussion, you can see the Chromium content sniffer here:
>
> http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup
>
> The top of that file has some comments that explain some of the
> guiding design choices in the algorithm and a comparison with the
> behavior of some other browsers.
>
> Adam
>


More information about the webkit-dev mailing list