<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[184401] trunk/Source/WebCore</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/184401">184401</a></dd>
<dt>Author</dt> <dd>ap@apple.com</dd>
<dt>Date</dt> <dd>2015-05-15 11:41:54 -0700 (Fri, 15 May 2015)</dd>
</dl>
<h3>Log Message</h3>
<pre>Cyrillic top-level domains are displayed as punycode
https://bugs.webkit.org/show_bug.cgi?id=145024
rdar://problem/17747133
rdar://problem/14116594
Reviewed by Tim Horton.
Handling each TLD in code is annoying, but we can probably survive like this
for a few more years, and maybe we'll think of an entirely different way to deal
with non-ASCII domain labels in the meanwhile.
* platform/mac/WebCoreNSURLExtras.mm:
(WebCore::isSecondLevelDomainNameAllowedByTLDRules):
(WebCore::allCharactersAllowedByTLDRules):</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCoreplatformmacWebCoreNSURLExtrasmm">trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (184400 => 184401)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2015-05-15 18:40:09 UTC (rev 184400)
+++ trunk/Source/WebCore/ChangeLog        2015-05-15 18:41:54 UTC (rev 184401)
</span><span class="lines">@@ -1,3 +1,20 @@
</span><ins>+2015-05-15 Alexey Proskuryakov <ap@apple.com>
+
+ Cyrillic top-level domains are displayed as punycode
+ https://bugs.webkit.org/show_bug.cgi?id=145024
+ rdar://problem/17747133
+ rdar://problem/14116594
+
+ Reviewed by Tim Horton.
+
+ Handling each TLD in code is annoying, but we can probably survive like this
+ for a few more years, and maybe we'll think of an entirely different way to deal
+ with non-ASCII domain labels in the meanwhile.
+
+ * platform/mac/WebCoreNSURLExtras.mm:
+ (WebCore::isSecondLevelDomainNameAllowedByTLDRules):
+ (WebCore::allCharactersAllowedByTLDRules):
+
</ins><span class="cx"> 2015-05-15 Roger Fong <roger_fong@apple.com>
</span><span class="cx">
</span><span class="cx"> Cursor is displayed after full screen video controls fade away.
</span></span></pre></div>
<a id="trunkSourceWebCoreplatformmacWebCoreNSURLExtrasmm"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm (184400 => 184401)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm        2015-05-15 18:40:09 UTC (rev 184400)
+++ trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm        2015-05-15 18:41:54 UTC (rev 184401)
</span><span class="lines">@@ -31,6 +31,7 @@
</span><span class="cx"> #import "WebCoreNSStringExtras.h"
</span><span class="cx"> #import "WebCoreNSURLExtras.h"
</span><span class="cx"> #import "WebCoreSystemInterface.h"
</span><ins>+#import <functional>
</ins><span class="cx"> #import <wtf/ObjcRuntimeExtras.h>
</span><span class="cx"> #import <wtf/RetainPtr.h>
</span><span class="cx"> #import <wtf/Vector.h>
</span><span class="lines">@@ -262,33 +263,189 @@
</span><span class="cx"> return YES;
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+static bool isSecondLevelDomainNameAllowedByTLDRules(const UChar* buffer, int32_t length, const std::function<bool(UChar)>& characterIsAllowed)
+{
+ ASSERT(length > 0);
+
+ for (int32_t i = length - 1; i >= 0; --i) {
+ UChar ch = buffer[i];
+
+ if (characterIsAllowed(ch))
+ continue;
+
+ // Only check the second level domain. Lower level registrars may have different rules.
+ if (ch == '.')
+ break;
+
+ return false;
+ }
+ return true;
+}
+
+#define CHECK_RULES_IF_SUFFIX_MATCHES(suffix, function) \
+ { \
+ static const int32_t suffixLength = sizeof(suffix) / sizeof(suffix[0]); \
+ if (length > suffixLength && 0 == memcmp(buffer + length - suffixLength, suffix, sizeof(suffix))) \
+ return isSecondLevelDomainNameAllowedByTLDRules(buffer, length - suffixLength, function); \
+ }
+
+static bool isRussianDomainNameCharacter(UChar ch)
+{
+ // Only modern Russian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || (ch >= '0' && ch <= '9') || ch == '-';
+}
+
</ins><span class="cx"> static BOOL allCharactersAllowedByTLDRules(const UChar* buffer, int32_t length)
</span><span class="cx"> {
</span><span class="cx"> // Skip trailing dot for root domain.
</span><span class="cx"> if (buffer[length - 1] == '.')
</span><span class="cx"> length--;
</span><del>-
- if (length > 3 && buffer[length - 3] == '.'
- && buffer[length - 2] == 0x0440 // CYRILLIC SMALL LETTER ER
- && buffer[length - 1] == 0x0444) // CYRILLIC SMALL LETTER EF
- {
- // Rules defined by <http://www.cctld.ru/ru/docs/rulesrf.php>. This code only checks requirements that matter for presentation purposes.
- for (int32_t i = length - 4; i; --i) {
- UChar ch = buffer[i];
-
- // Only modern Russian letters, digits and dashes are allowed.
- if ((ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451|| (ch >= '0' && ch <= '9') || ch == '-')
- continue;
-
- // Only check top level domain. Lower level registrars may have different rules.
- if (ch == '.')
- break;
-
- return NO;
- }
- return YES;
- }
-
</del><ins>+
+ // http://cctld.ru/files/pdf/docs/rules_ru-rf.pdf
+ static const UChar cyrillicRF[] = {
+ '.',
+ 0x0440, // CYRILLIC SMALL LETTER ER
+ 0x0444 // CYRILLIC SMALL LETTER EF
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRF, isRussianDomainNameCharacter);
+
+ // http://rusnames.ru/rules.pl
+ static const UChar cyrillicRUS[] = {
+ '.',
+ 0x0440, // CYRILLIC SMALL LETTER ER
+ 0x0443, // CYRILLIC SMALL LETTER U
+ 0x0441 // CYRILLIC SMALL LETTER ES
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRUS, isRussianDomainNameCharacter);
+
+ // http://ru.faitid.org/projects/moscow/documents/moskva/idn
+ static const UChar cyrillicMOSKVA[] = {
+ '.',
+ 0x043C, // CYRILLIC SMALL LETTER EM
+ 0x043E, // CYRILLIC SMALL LETTER O
+ 0x0441, // CYRILLIC SMALL LETTER ES
+ 0x043A, // CYRILLIC SMALL LETTER KA
+ 0x0432, // CYRILLIC SMALL LETTER VE
+ 0x0430 // CYRILLIC SMALL LETTER A
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMOSKVA, isRussianDomainNameCharacter);
+
+ // http://www.dotdeti.ru/foruser/docs/regrules.php
+ static const UChar cyrillicDETI[] = {
+ '.',
+ 0x0434, // CYRILLIC SMALL LETTER DE
+ 0x0435, // CYRILLIC SMALL LETTER IE
+ 0x0442, // CYRILLIC SMALL LETTER TE
+ 0x0438 // CYRILLIC SMALL LETTER I
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicDETI, isRussianDomainNameCharacter);
+
+ // http://corenic.org - rules not published. The word is Russian, so only allowing Russian at this time,
+ // although we may need to revise the checks if this ends up being used with other languages spoken in Russia.
+ static const UChar cyrillicONLAYN[] = {
+ '.',
+ 0x043E, // CYRILLIC SMALL LETTER O
+ 0x043D, // CYRILLIC SMALL LETTER EN
+ 0x043B, // CYRILLIC SMALL LETTER EL
+ 0x0430, // CYRILLIC SMALL LETTER A
+ 0x0439, // CYRILLIC SMALL LETTER SHORT I
+ 0x043D // CYRILLIC SMALL LETTER EN
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicONLAYN, isRussianDomainNameCharacter);
+
+ // http://corenic.org - same as above.
+ static const UChar cyrillicSAYT[] = {
+ '.',
+ 0x0441, // CYRILLIC SMALL LETTER ES
+ 0x0430, // CYRILLIC SMALL LETTER A
+ 0x0439, // CYRILLIC SMALL LETTER SHORT I
+ 0x0442 // CYRILLIC SMALL LETTER TE
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSAYT, isRussianDomainNameCharacter);
+
+ // http://pir.org/products/opr-domain/ - rules not published. According to the registry site,
+ // the intended audience is "Russian and other Slavic-speaking markets".
+ // Chrome appears to only allow Russian, so sticking with that for now.
+ static const UChar cyrillicORG[] = {
+ '.',
+ 0x043E, // CYRILLIC SMALL LETTER O
+ 0x0440, // CYRILLIC SMALL LETTER ER
+ 0x0433 // CYRILLIC SMALL LETTER GHE
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicORG, isRussianDomainNameCharacter);
+
+ // http://cctld.by/rules.html
+ static const UChar cyrillicBEL[] = {
+ '.',
+ 0x0431, // CYRILLIC SMALL LETTER BE
+ 0x0435, // CYRILLIC SMALL LETTER IE
+ 0x043B // CYRILLIC SMALL LETTER EL
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicBEL, [](UChar ch) {
+ // Russian and Byelorussian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0456 || ch == 0x045E || ch == 0x2019 || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
+ // http://www.nic.kz/docs/poryadok_vnedreniya_kaz_ru.pdf
+ static const UChar cyrillicKAZ[] = {
+ '.',
+ 0x049B, // CYRILLIC SMALL LETTER KA WITH DESCENDER
+ 0x0430, // CYRILLIC SMALL LETTER A
+ 0x0437 // CYRILLIC SMALL LETTER ZE
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicKAZ, [](UChar ch) {
+ // Kazakh letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04D9 || ch == 0x0493 || ch == 0x049B || ch == 0x04A3 || ch == 0x04E9 || ch == 0x04B1 || ch == 0x04AF || ch == 0x04BB || ch == 0x0456 || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
+ // http://uanic.net/docs/documents-ukr/Rules%20of%20UKR_v4.0.pdf
+ static const UChar cyrillicUKR[] = {
+ '.',
+ 0x0443, // CYRILLIC SMALL LETTER U
+ 0x043A, // CYRILLIC SMALL LETTER KA
+ 0x0440 // CYRILLIC SMALL LETTER ER
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicUKR, [](UChar ch) {
+ // Russian and Ukrainian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0491 || ch == 0x0404 || ch == 0x0456 || ch == 0x0457 || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
+ // http://www.rnids.rs/data/DOKUMENTI/idn-srb-policy-termsofuse-v1.4-eng.pdf
+ static const UChar cyrillicSRB[] = {
+ '.',
+ 0x0441, // CYRILLIC SMALL LETTER ES
+ 0x0440, // CYRILLIC SMALL LETTER ER
+ 0x0431 // CYRILLIC SMALL LETTER BE
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSRB, [](UChar ch) {
+ // Serbian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0452 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045B || ch == 0x045F || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
+ // http://marnet.mk/doc/pravilnik-mk-mkd.pdf
+ static const UChar cyrillicMKD[] = {
+ '.',
+ 0x043C, // CYRILLIC SMALL LETTER EM
+ 0x043A, // CYRILLIC SMALL LETTER KA
+ 0x0434 // CYRILLIC SMALL LETTER DE
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMKD, [](UChar ch) {
+ // Macedonian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0453 || ch == 0x0455 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045C || ch == 0x045F || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
+ // https://www.mon.mn/cs/
+ static const UChar cyrillicMON[] = {
+ '.',
+ 0x043C, // CYRILLIC SMALL LETTER EM
+ 0x043E, // CYRILLIC SMALL LETTER O
+ 0x043D // CYRILLIC SMALL LETTER EN
+ };
+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMON, [](UChar ch) {
+ // Mongolian letters, digits and dashes are allowed.
+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04E9 || ch == 0x04AF || (ch >= '0' && ch <= '9') || ch == '-';
+ });
+
</ins><span class="cx"> // Not a known top level domain with special rules.
</span><span class="cx"> return NO;
</span><span class="cx"> }
</span></span></pre>
</div>
</div>
</body>
</html>