[webkit-help] Feedback about Content Blocking Extensions from Adblock Plus

Wed Jun 17 01:31:33 PDT 2015

On Mon, Jun 15, 2015 at 5:42 AM, Benjamin Poulain <benjamin at webkit.org>
wrote:
>
> Did you already file radars for the issues? If you did, can you give the
> radar numbers? I'll link them with the meta radars tracking the features
> requests we are getting for content blockers. If you did not file radars,
> I'll do that.
>

I didn't.

The content blockers in WebKit are vastly different from what extensions do
> today. As such, a solution that works well for classic extensions may not
> be the best way to solve the same problem in content blockers.
>

Sure, previously we could simply match our filters against requests. Now,
we need to convert our filters into your block list format, which in it's
current state however only supports a subset of our filters.

> If you tell us about the actual problems (for example an example of a
> website were you can't filter a resource), it would be easier for us to
> identify what we can do.
>

I looped in Arthur (aka. MontzA, the EasyList author). I hope he can
provide some more concrete examples. Also he might be aware of some use
cases I didn't point out yet.

>  1. Most importantly, our exception rules are recursive. For example ||
> example.com$document prevents not only documents loaded from example.com
> being blocked. But also resources loaded as part of that document or in any
> of it's subframes or their subframes wouldn't be blocked either. However,
> this logic doesn't seem to be possible with the ignore-previous-rules
> action. A recursive flag would come handy here.
>
> That seems feasible. I have a couple of ideas on how to best achieve this.
>
> Including the subframes is a bit worrying to me. A subframe of a trusted
> source is typically not to be trusted. Do you have examples where that is
> useful?
>

Yes, this is what happens when the user disables Adblock Plus on a website
(e.g. in the popover). An exception rule like that is added then. If you
disable adblocking on a website, you expect that it doesn't block anything
in any subframe either. But also some filters in EasyList are taking
advantage of that.

 2. There doesn't seem to be a way to distinguish between document and
> subdocument requests. While Adblock Plus blocks frames, it never blocks the
> top level document, so that users can still access the resource that is
> blocked, when entering its URL in the address bar.
>
> This sounds like a good idea for your use case.
>
> Any suggestion on the format? What would be the best way to specify this
> in your opinion?
>

I'd suggest to make the type "document" only match top level documents,
adding another type "subdocument" which only matches subframes.

>  3. A dedicated resource-type for XMLHttpRequests, objects (requests
> loading a Flash element) and object subrequests (subsequent requests issued
> by a Flash object) would certainly be useful as well. EasyList has quite
> some filters specifically checking for those.
>
> Targeting XHR specifically seems very easy to counter to me. Couldn't one
> just use the Fetch API or Sockets to work around the rule?
>

I don't think so. Note that with the new content blocking API you cannot
run code on request anymore. And even then you probably don't want to
repeat requests just to retrieve additional metadata. And even then the
response won't tell you in which context the request originally occurred.

> Do you have an example where the distinction matters?
>

I'm aware of some websites (e.g. porhub.com) which currently try to
circumvent adblockers by loading ads that initially failed to load with an
XMLHttpRequest using a quite random URL. One way to tackle that would be
blocking all XMLHttpRequests there. However, blocking object requests (ie.
Flash) as well would break functionality there.

Regarding the object subrequest, that seems like a valuable thing to do.
>

Yes, blocking object sub-requests (like you can do on Chrome and Firefox)
is essential to block in-video ads and tracking in Flash.

> There is a technical reason why you cannot modify/add/delete individual
> rules. In the engine, the rules are combined into giant state machines. The
> concept of rule does not exist past the compiler, after that all we have is
> a very simple bytecode (
> http://trac.webkit.org/browser/trunk/Source/WebCore/contentextensions/DFABytecode.h)
> that executes several thousands triggers at once.
>
> Note that compiling is not cheap. We are paying compile time when loading
> rules in exchange for faster runtime and lower memory footprint.
>

Note it probably gets even worse, when we have to convert all of our
filters into your blocking rule format, when any filters changed.

> How often do you need to update the rules?
>

If the user doesn't touch the settings, the rules will only be updated when
filter lists gets downloaded (once a day). However, toggling
"Disabled/Enabled on this site" from the popover for example results into a
filter change as well.

One more thing, I forgot mentioning in my first email: Adblock Plus is
"collapsing" blocked elements. That means if a request is blocked, elements
loaded from that URL will be set to "display: none;". Therefore we
currently send a message from the content script to the global page, for
every element that might have been blocked, determining whether to collapse
it. However, a "collapse" option or a way to combine the "block" and
"css-display-none" action in the block list would come handy, and would be
more efficient.

Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-help/attachments/20150617/63c8b019/attachment-0001.html>