<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thank you very much Alex and Benjamin. Your answers were really helpful.<div class=""><br class=""></div><div class="">I wrongly thought the bracket syntax was only allowed for very basic ranges like [0-9] or [a-z] since the "Introduction to WebKit Content Blockers” only mentioned "Matching ranges with the range syntax [a-b]”.</div><div class=""><br class=""></div><div class="">I’m glad to know the full bracket syntax is actually supported.</div><div class=""><br class=""></div><div class="">Romain</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class="">On Aug 17, 2015, at 9:59 PM, Benjamin Poulain <<a href="mailto:benjamin@webkit.org" class="">benjamin@webkit.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">Hi Romain,<br class=""><br class="">On 8/17/15 11:03 AM, Romain Jacquinot wrote:<br class=""><blockquote type="cite" class="">For now, the following regular expression features are supported by<br class="">content blockers:<br class=""><br class=""> * Matching any character with “.”.<br class=""> * Matching ranges with the range syntax [a-b].<br class=""> * Quantifying expressions with “?”, “+” and “*”.<br class=""> * Groups with parenthesis.<br class=""> * Beginning of line (“^”) and end of line (“$”) marker<br class=""><br class="">However, there doesn’t seem to be a way to find any of the alternatives<br class="">specified with “|” or find any character not between the brackets "[^]”.<br class=""></blockquote><br class="">Actually the "[^]" character set syntax is supported.<br class=""><br class="">It could cause compile time issues on previous betas. That has been fixed in beta 5.<br class=""><br class=""><blockquote type="cite" class="">This is an issue when you want to block addresses like<br class="">*<a href="http://www.example.com" class="">http://www.example.com</a> <<a href="http://example.com" class="">http://example.com</a>>/*,<br class="">*<a href="https://example.com/*foobar.jpg" class="">https://example.com/*foobar.jpg</a>, *<a href="http://example.com:*8080" class="">http://example.com:*8080</a> but not<br class="">*<a href="http://example.com**.*hk" class="">http://example.com**.*hk</a>.<br class=""></blockquote><br class="">The URLs are canonicalized before being processed by Content Blockers. That ensure some invariants on the format. For example, the end of the domain name always ends with ":" or "/". The domain name is always lowercase.<br class=""><br class="">Typically, I write domain triggers like this:<br class=""><br class="">"trigger": {<br class=""> "url-filter": "^https://([^:/]+\\.)<a href="http://example.com" class="">example.com</a>[:/]",<br class=""> "url-filter-is-case-sensitive": true<br class="">}<br class=""><br class=""><br class=""><blockquote type="cite" class="">With at least one of those features, you could write something like:<br class=""><br class=""> {<br class="">"action" : {<br class="">"type" : "block"<br class=""> },<br class="">"trigger" : {<br class="">"url-filter": "^https?://(www\\.)?example\\.com(/|:|?)+"<br class=""></blockquote><br class="">This does not work but<br class=""> "^https?://(www\\.)?example\\.com[/:?]+"<br class="">is equivalent.<br class=""><br class=""><blockquote type="cite" class=""> }<br class=""> }<br class=""><br class="">or:<br class=""><br class=""> {<br class="">"action" : {<br class="">"type" : "block"<br class=""> },<br class="">"trigger" : {<br class="">"url-filter" : "^https?://(www\\.)?example\\.com[^.]"<br class=""></blockquote><br class="">This pattern should work fine in beta 5.<br class=""><br class=""><blockquote type="cite" class=""> }<br class=""> }<br class=""><br class="">Please note that in this case, the if-domain field wouldn’t help for<br class="">embedded content.<br class=""><br class="">Should I write the same rule many times for the different cases (“/",<br class="">“:", “?”)? (doesn’t feel like a very elegant solution though). Since<br class="">they share the same prefix, will these rules be optimized? On the webkit<br class="">blog, it is written "/The rules are grouped by the prefix “https?://,<br class="">and it only counts as one rule with quantifiers./”. Does it mean that it<br class="">will only count as one rule against the 50,000 rule limit?<br class=""></blockquote><br class="">Having 3 rules with 3 different ending is fine as long as they are not quantified. Their prefix would be merged in the compiler frontend.<br class=""><br class="">Having 3 rules with quantifiers per URL would likely cause your rules to be rejected by the compiler even under the 50k rule limit.<br class=""><br class="">In any case, the 50k rule limit is on the number of trigger. The number of rule is counted before rules are merged.<br class=""><br class=""><blockquote type="cite" class="">Do you see an elegant solution to handle this case? If not, could you<br class="">please consider adding at least one of those regular expression features<br class="">for content blockers in Safari?<br class=""></blockquote><br class="">Are the solutions above good enough for your use case?<br class=""><br class="">Benjamin</div></blockquote><br class=""></div><div class=""><br class=""></div><div class=""><div class="">On Aug 17, 2015, at 8:48 PM, Alex Christensen <<a href="mailto:achristensen@apple.com" class="">achristensen@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Aug 17, 2015, at 11:03 AM, Romain Jacquinot <<a href="mailto:rjacquinot@me.com" class="">rjacquinot@me.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi,<div class=""><br class=""></div><div class="">For now, the following regular expression features are supported by content blockers:</div><div class=""><div class=""><ul class=""><li class="">Matching any character with “.”.</li><li class="">Matching ranges with the range syntax [a-b].</li><li class="">Quantifying expressions with “?”, “+” and “*”.</li><li class="">Groups with parenthesis.</li><li class="">Beginning of line (“^”) and end of line (“$”) marker</li></ul></div></div><div class="">However, there doesn’t seem to be a way to find any of the alternatives specified with “|” or find any character not between the brackets "[^]”.</div></div></div></blockquote><div class="">| is indeed not implemented yet.</div>If I’m not mistaken, [^a] should work, though. You could always do tricky things with ranges, like [\u0001-.0-9;->@-\u007F] but this doesn’t read very well and it might lead to hard-to-find errors for those of us that don’t have ASCII memorized.<br class=""><blockquote type="cite" class=""><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class=""><br class=""></div><div class="">This is an issue when you want to block addresses like <b class=""><a href="http://www/" class="">http://www</a>.<a href="http://example.com/" class="">example.com</a><font color="#4f7a28" class="">/</font></b>, <b class=""><a href="https://example.com/" class="">https://example.com</a><font color="#4f7a28" class="">/</font></b>foobar.jpg, <b class=""><a href="http://example.com/" class="">http://example.com</a><font color="#4f7a28" class="">:</font></b>8080 but not <b class=""><a href="http://example.com/" class="">http://example.com</a></b><font color="#ff2600" class=""><b class="">.</b></font>hk.</div><div class=""><br class=""></div><div class="">With at least one of those features, you could write something like:</div><div class=""><br class=""></div><div class=""><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"action"</span> : {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"type"</span> : <span class="" style="color: rgb(209, 47, 27);">"block"</span></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> },</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"trigger"</span> : {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(209, 47, 27);"><span class=""> </span>"url-filter"<span class=""> : </span>"^https?://(www\\.)?example\\.com(/|:|?)+"</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> }</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> }</div></div><div class=""><br class=""></div><div class="">or:</div><div class=""><br class=""></div><div class=""><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"action"</span> : {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"type"</span> : <span class="" style="color: rgb(209, 47, 27);">"block"</span></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> },</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> <span class="" style="color: rgb(209, 47, 27);">"trigger"</span> : {</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(209, 47, 27);"><span class=""> </span>"url-filter"<span class=""> : </span>"^https?://(www\\.)?example\\.com[^.]"</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> }</div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"> }</div></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"><br class=""></div><div class="" style="margin: 0px; line-height: normal;"><div class=""><span class="">Please note that in this case, the </span><font face="Menlo" class="" style="font-size: 11px;">if-domain</font> field wouldn’t help for embedded content.</div><div class=""><br class=""></div><div class="">Should I write the same rule many times for the different cases (“/", “:", “?”)? (doesn’t feel like a very elegant solution though). Since they share the same prefix, will these rules be optimized? On the webkit blog, it is written "<i class="">The rules are grouped by the prefix “https?://, and it only counts as one rule with quantifiers.</i>”. Does it mean that it will only count as one rule against the 50,000 rule limit?</div></div></div></div></blockquote><div class="">Rules sharing a prefix are combined into the same DFA when compiling the combined regular expressions. Fewer DFAs means faster performance. A prefix in this case is all the terms of a regular expression up to the last quantified term, so ab?c and ab?d would be combined into the same DFA and there wouldn’t be much of a performance penalty for adding more regular expressions with ab? at the beginning and no other quantified terms, but ab?cd?e has another quantified term, so it would be put into a separate DFA in our implementation. In your case, if all your rules start with ^https? with no other quantified terms, then they will all be optimized well, but if all the rules have unique terms before the last quantified term like ^https?://a\.(com)? ^<a href="https://b/" class="">https://b</a>\.(com)? ^<a href="https://c/" class="">https://c</a>\.(com)? etc. then these rules will not be combined well and it will hurt performance when checking if a URL matches the rules. To make it simple, the less you use ?, *, or +, the faster it will be.</div><div class=""><br class=""></div>You could write a rule many times, but the 50000 rule limit applies when parsing the rules, so each rule will count towards that limit.</div><div class=""><blockquote type="cite" class=""><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="" style="margin: 0px; line-height: normal;"><div class=""><br class=""></div><div class="">Do you see an elegant solution to handle this case? If not, could you please consider adding at least one of those regular expression features for content blockers in Safari?</div></div></div></div></blockquote>You could do something like ^https?://(www\.)?example\.com[/:?]<br class=""><blockquote type="cite" class=""><div class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="" style="margin: 0px; line-height: normal;"><div class=""><br class=""></div><div class="">Thanks.</div><div class=""><br class=""></div><div class=""></div></div></div>_______________________________________________<br class="">webkit-help mailing list<br class=""><a href="mailto:webkit-help@lists.webkit.org" class="">webkit-help@lists.webkit.org</a><br class=""><a href="https://lists.webkit.org/mailman/listinfo/webkit-help" class="">https://lists.webkit.org/mailman/listinfo/webkit-help</a><br class=""></div></blockquote></div><br class=""></div></div></div></body></html>