<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 15, 2015 at 5:42 AM, Benjamin Poulain <span dir="ltr"><<a href="mailto:benjamin@webkit.org" target="_blank">benjamin@webkit.org</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
Did you already file radars for the issues? If you did, can you give
the radar numbers? I'll link them with the meta radars tracking the
features requests we are getting for content blockers. If you did
not file radars, I'll do that.</div></blockquote><div> </div><div>I didn't.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">The content blockers in WebKit are vastly different from what
extensions do today. As such, a solution that works well for classic
extensions may not be the best way to solve the same problem in
content blockers.</div></blockquote><div><br></div><div>Sure, previously we could simply match our filters against requests. Now, we need to convert our filters into your block list format, which in it's current state however only supports a subset of our filters.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">If you tell us about the actual problems (for
example an example of a website were you can't filter a resource),
it would be easier for us to identify what we can do.</div></blockquote><div><br></div><div>I looped in Arthur (aka. MontzA, the EasyList author). I hope he can provide some more concrete examples. Also he might be aware of some use cases I didn't point out yet.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span>
<blockquote type="cite">
<div dir="ltr">
<div>1. Most importantly, our exception rules are recursive. For
example ||<a href="http://example.com" target="_blank">example.com</a>$document
prevents not only documents loaded from <a href="http://example.com" target="_blank">example.com</a>
being blocked. But also resources loaded as part of that
document or in any of it's subframes or their subframes
wouldn't be blocked either. However, this logic doesn't seem
to be possible with the ignore-previous-rules action. A
recursive flag would come handy here.</div>
</div>
</blockquote></span>
That seems feasible. I have a couple of ideas on how to best achieve
this.<br>
<br>
Including the subframes is a bit worrying to me. A subframe of a
trusted source is typically not to be trusted. Do you have examples
where that is useful?</div></blockquote><div><br></div><div>Yes, this is what happens when the user disables Adblock Plus on a website (e.g. in the popover). An exception rule like that is added then. If you disable adblocking on a website, you expect that it doesn't block anything in any subframe either. But also some filters in EasyList are taking advantage of that.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span>
<blockquote type="cite">
<div dir="ltr">
<div>2. There doesn't seem to be a way to distinguish between
document and subdocument requests. While Adblock Plus blocks
frames, it never blocks the top level document, so that users
can still access the resource that is blocked, when entering
its URL in the address bar.</div>
</div>
</blockquote></span>
This sounds like a good idea for your use case.<br>
<br>
Any suggestion on the format? What would be the best way to specify
this in your opinion?</div></blockquote><div> </div><div>I'd suggest to make the type "document" only match top level documents, adding another type "subdocument" which only matches subframes.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span>
<blockquote type="cite">
<div dir="ltr">
<div>3. A dedicated resource-type for XMLHttpRequests, objects
(requests loading a Flash element) and object subrequests
(subsequent requests issued by a Flash object) would certainly
be useful as well. EasyList has quite some filters
specifically checking for those.</div>
</div>
</blockquote></span>
Targeting XHR specifically seems very easy to counter to me.
Couldn't one just use the Fetch API or Sockets to work around the
rule?<br></div></blockquote><div><br></div><div>I don't think so. Note that with the new content blocking API you cannot run code on request anymore. And even then you probably don't want to repeat requests just to retrieve additional metadata. And even then the response won't tell you in which context the request originally occurred.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
Do you have an example where the distinction matters?<br></div></blockquote><div><br></div><div>I'm aware of some websites (e.g. <a href="http://porhub.com">porhub.com</a>) which currently try to circumvent adblockers by loading ads that initially failed to load with an XMLHttpRequest using a quite random URL. One way to tackle that would be blocking all XMLHttpRequests there. However, blocking object requests (ie. Flash) as well would break functionality there.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
Regarding the object subrequest, that seems like a valuable thing to
do.</div></blockquote><div><br></div><div>Yes, blocking object sub-requests (like you can do on Chrome and Firefox) is essential to block in-video ads and tracking in Flash.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
There is a technical reason why you cannot modify/add/delete
individual rules. In the engine, the rules are combined into giant
state machines. The concept of rule does not exist past the
compiler, after that all we have is a very simple bytecode
(<a href="http://trac.webkit.org/browser/trunk/Source/WebCore/contentextensions/DFABytecode.h" target="_blank">http://trac.webkit.org/browser/trunk/Source/WebCore/contentextensions/DFABytecode.h</a>)
that executes several thousands triggers at once.<br>
<br>
Note that compiling is not cheap. We are paying compile time when
loading rules in exchange for faster runtime and lower memory
footprint.</div></blockquote><div><br></div><div>Note it probably gets even worse, when we have to convert all of our filters into your blocking rule format, when any filters changed.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">How often do you need to update the rules?<br></div></blockquote><div><br></div><div>If the user doesn't touch the settings, the rules will only be updated when filter lists gets downloaded (once a day). However, toggling "Disabled/Enabled on this site" from the popover for example results into a filter change as well.</div><div><br></div><div>One more thing, I forgot mentioning in my first email: Adblock Plus is "collapsing" blocked elements. That means if a request is blocked, elements loaded from that URL will be set to "display: none;". Therefore we currently send a message from the content script to the global page, for every element that might have been blocked, determining whether to collapse it. However, a "collapse" option or a way to combine the "block" and "css-display-none" action in the block list would come handy, and would be more efficient.</div><div><br></div><div>Sebastian</div></div></div></div>