[webkit-help] Regular expressions for content blocking
Romain Jacquinot
rjacquinot at me.com
Mon Aug 17 11:03:06 PDT 2015
Hi,
For now, the following regular expression features are supported by content blockers:
Matching any character with “.”.
Matching ranges with the range syntax [a-b].
Quantifying expressions with “?”, “+” and “*”.
Groups with parenthesis.
Beginning of line (“^”) and end of line (“$”) marker
However, there doesn’t seem to be a way to find any of the alternatives specified with “|” or find any character not between the brackets "[^]”.
This is an issue when you want to block addresses like http://www.example.com/, https://example.com/foobar.jpg, http://example.com:8080 but not http://example.com.hk.
With at least one of those features, you could write something like:
{
"action" : {
"type" : "block"
},
"trigger" : {
"url-filter" : "^https?://(www\\.)?example\\.com(/|:|?)+"
}
}
or:
{
"action" : {
"type" : "block"
},
"trigger" : {
"url-filter" : "^https?://(www\\.)?example\\.com[^.]"
}
}
Please note that in this case, the if-domain field wouldn’t help for embedded content.
Should I write the same rule many times for the different cases (“/", “:", “?”)? (doesn’t feel like a very elegant solution though). Since they share the same prefix, will these rules be optimized? On the webkit blog, it is written "The rules are grouped by the prefix “https?://, and it only counts as one rule with quantifiers.”. Does it mean that it will only count as one rule against the 50,000 rule limit?
Do you see an elegant solution to handle this case? If not, could you please consider adding at least one of those regular expression features for content blockers in Safari?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-help/attachments/20150817/11b181af/attachment.html>
More information about the webkit-help
mailing list