[webkit-help] Content blocker, performance issue

Mon Jul 6 18:02:09 PDT 2015

Hi Tuan,

On 7/6/15 3:07 PM, Tuan Bach Quoc wrote:
> I was reading the documentation around the content blocker extension.
> there is one sentence that triggers my curiosity:
>
> If the rule compiler detects that a set of rules would negatively impact
> user experience, it refuses to load them and returns an error.
>
> The compiler will check the performance of the rules only once? or every
> time a page is loaded?
>
> Basically my concern is that the performance of the set of rules are
> also impacted by the page the user is currently loading. The performance
> of a set of rules could be different depending on which website it is
> applied to right ?
>
> Could we have more details on how the compiler works and how does it
> evaluate if the rules are "negatively impact user experience" ?
>
> Many thanks in advance for reading,

Basically, all the rules are combined together into a few giant state 
machines.

First, the triggers are grouped into sets that work well together. Then 
each group is transformed into the equivalent non-deterministic 
automatons (NFAs). The NFAs are then transformed into a deterministic 
machine (DFAs) and minimized.

When resulting machines are small, we combine them to reduce the number 
of machines and reach a good ratio machine/size.

Each machine is then lowered to bytecode for execution.

A big rule list that have similar expressions ends up being just 10-1000 
state machines. Evaluating them is usually very efficient (the worst 
case being: "128 * URL length * number of machines" instructions, which 
is a small number).

Since we are dealing with deterministic state machines, we have an idea 
of how fast they are ahead of time.

We could find the worst case by going through trough the graphs but we 
don't do that at the moment. Simple limits on the size and numbers of 
machines seems to serve us well so far.

---

To answer your perf-per-page question: we do not track the performance 
of extensions per page. In many cases, the time spent in Content 
Blockers is so small that it is not measurable anyway.

What we do is place various limits in the compiler to make sure that the 
machines stay efficient. We do that as we compile, we give up the 
compilation step.

---

The limits are still evolving. The current limits are not very 
restrictive because we want developers to experiment and tell us what 
works well and what doesn't.

There are many things we can do in the compiler when hitting a bad case 
but we need to know about it.

Many developers have been sending us their blocking lists and we are 
making dramatic improvements in response to what we discover.

I invite you to email me your biggest blocking lists. This is the kind 
of data that will help us improve those limits.

Benjamin