<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[183552] trunk</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/183552">183552</a></dd>
<dt>Author</dt> <dd>darin@apple.com</dd>
<dt>Date</dt> <dd>2015-04-29 09:33:12 -0700 (Wed, 29 Apr 2015)</dd>
</dl>
<h3>Log Message</h3>
<pre>[ES6] Implement Unicode code point escapes
https://bugs.webkit.org/show_bug.cgi?id=144377
Reviewed by Antti Koivisto.
Source/JavaScriptCore:
* parser/Lexer.cpp: Moved the UnicodeHexValue class in here from
the header. Made it a non-member class so it doesn't need to be part
of a template. Made it use UChar32 instead of int for the value to
make it clearer what goes into this class.
(JSC::ParsedUnicodeEscapeValue::isIncomplete): Added. Replaces the
old type() function.
(JSC::Lexer<CharacterType>::parseUnicodeEscape): Renamed from
parseFourDigitUnicodeHex and added support for code point escapes.
(JSC::isLatin1): Added an overload for UChar32.
(JSC::isIdentStart): Changed this to take UChar32; no caller tries
to call it with a UChar, so no need to overload for that type for now.
(JSC::isNonLatin1IdentPart): Changed argument type to UChar32 for clarity.
Also added FIXME about a subtle ES6 change that we might want to make later.
(JSC::isIdentPart): Changed this to take UChar32; no caller tries
to call it with a UChar, so no need to overload for that type for now.
(JSC::isIdentPartIncludingEscapeTemplate): Made this a template so that we
don't need to repeat the code twice. Added code to handle code point escapes.
(JSC::isIdentPartIncludingEscape): Call the template instead of having the
code in line.
(JSC::Lexer<CharacterType>::recordUnicodeCodePoint): Added.
(JSC::Lexer<CharacterType>::parseIdentifierSlowCase): Made small tweaks and
updated to call parseUnicodeEscape instead of parseFourDigitUnicodeHex.
(JSC::Lexer<CharacterType>::parseComplexEscape): Call parseUnicodeEscape
instead of parseFourDigitUnicodeHex. Move the code to handle "\u" before
the code that handles the escapes, since the code point escape code now
consumes characters while parsing rather than peeking ahead. Test case
covers this: Symptom would be that "\u{" would evaluate to "u" instead of
giving a syntax error.
* parser/Lexer.h: Updated for above changes.
* runtime/StringConstructor.cpp:
(JSC::stringFromCodePoint): Use ICU's UCHAR_MAX_VALUE instead of writing
out 0x10FFFF; clearer this way.
Source/WebCore:
Test: js/unicode-escape-sequences.html
* css/CSSParser.cpp:
(WebCore::CSSParser::parseEscape): Use ICU's UCHAR_MAX_VALUE instead of writing
out 0x10FFFF; clearer this way. Also use our replacementCharacter instead of
writing out 0xFFFD.
* html/parser/HTMLEntityParser.cpp:
(WebCore::isAlphaNumeric): Deleted.
(WebCore::HTMLEntityParser::legalEntityFor): Use ICU's UCHAR_MAX_VALUE and
U_IS_SURROGATE instead of writing the code out. Didn't use U_IS_UNICODE_CHAR
because that also includes U_IS_UNICODE_NONCHAR and thus would change behavior,
but maye it's something we want to do in the future.
(WebCore::HTMLEntityParser::consumeNamedEntity): Use isASCIIAlphanumeric instead
of a the function in this file that does the same thing less efficiently.
* html/parser/InputStreamPreprocessor.h:
(WebCore::InputStreamPreprocessor::processNextInputCharacter): Use
replacementCharacter from CharacterNames.h instead of writing out 0xFFFd.
* xml/parser/CharacterReferenceParserInlines.h:
(WebCore::consumeCharacterReference): Use ICU's UCHAR_MAX_VALUE instead of
defining our own local highestValidCharacter constant.
LayoutTests:
* js/script-tests/unicode-escape-sequences.js: Added.
* js/unicode-escape-sequences-expected.txt: Added.
* js/unicode-escape-sequences.html: Added. Generated with make-script-test-wrappers.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsChangeLog">trunk/LayoutTests/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreparserLexercpp">trunk/Source/JavaScriptCore/parser/Lexer.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreparserLexerh">trunk/Source/JavaScriptCore/parser/Lexer.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeStringConstructorcpp">trunk/Source/JavaScriptCore/runtime/StringConstructor.cpp</a></li>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorecssCSSParsercpp">trunk/Source/WebCore/css/CSSParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLEntityParsercpp">trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserInputStreamPreprocessorh">trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h</a></li>
<li><a href="#trunkSourceWebCorexmlparserCharacterReferenceParserInlinesh">trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h</a></li>
</ul>
<h3>Added Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsjsscripttestsunicodeescapesequencesjs">trunk/LayoutTests/js/script-tests/unicode-escape-sequences.js</a></li>
<li><a href="#trunkLayoutTestsjsunicodeescapesequencesexpectedtxt">trunk/LayoutTests/js/unicode-escape-sequences-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsunicodeescapesequenceshtml">trunk/LayoutTests/js/unicode-escape-sequences.html</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/ChangeLog (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/ChangeLog        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/LayoutTests/ChangeLog        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -1,3 +1,14 @@
</span><ins>+2015-04-29 Darin Adler <darin@apple.com>
+
+ [ES6] Implement Unicode code point escapes
+ https://bugs.webkit.org/show_bug.cgi?id=144377
+
+ Reviewed by Antti Koivisto.
+
+ * js/script-tests/unicode-escape-sequences.js: Added.
+ * js/unicode-escape-sequences-expected.txt: Added.
+ * js/unicode-escape-sequences.html: Added. Generated with make-script-test-wrappers.
+
</ins><span class="cx"> 2015-04-29 Hyungwook Lee <hyungwook.lee@navercorp.com>
</span><span class="cx">
</span><span class="cx"> Fix crash in WebCore::LogicalSelectionOffsetCaches::ContainingBlockInfo::setBlock().
</span></span></pre></div>
<a id="trunkLayoutTestsjsscripttestsunicodeescapesequencesjs"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/js/script-tests/unicode-escape-sequences.js (0 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/script-tests/unicode-escape-sequences.js         (rev 0)
+++ trunk/LayoutTests/js/script-tests/unicode-escape-sequences.js        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -0,0 +1,138 @@
</span><ins>+description("Test of Unicode escape sequences in string literals and identifiers, especially code point escape sequences.");
+
+function codeUnits(string)
+{
+ var result = [];
+ for (var i = 0; i < string.length; ++i) {
+ var hex = "000" + string.charCodeAt(i).toString(16).toUpperCase();
+ result.push(hex.substring(hex.length - 4));
+ }
+ return result.join(",");
+}
+
+function testStringUnicodeEscapeSequence(sequence, expectedResult)
+{
+ shouldBeEqualToString('codeUnits("\\u' + sequence + '")', expectedResult);
+}
+
+function testInvalidStringUnicodeEscapeSequence(sequence)
+{
+ shouldThrow('codeUnits("\\u' + sequence + '")');
+}
+
+function testIdentifierStartUnicodeEscapeSequence(sequence, expectedResult)
+{
+ shouldBeEqualToString('codeUnits(function \\u' + sequence + '(){}.name)', expectedResult);
+}
+
+function testInvalidIdentifierStartUnicodeEscapeSequence(sequence)
+{
+ shouldThrow('codeUnits(function \\u' + sequence + '(){}.name)');
+}
+
+function testIdentifierPartUnicodeEscapeSequence(sequence, expectedResult)
+{
+ shouldBeEqualToString('codeUnits(function x\\u' + sequence + '(){}.name.substring(1))', expectedResult);
+}
+
+function testInvalidIdentifierPartUnicodeEscapeSequence(sequence)
+{
+ shouldThrow('codeUnits(function x\\u' + sequence + '(){}.name.substring(1))');
+}
+
+testStringUnicodeEscapeSequence("", "0075");
+testStringUnicodeEscapeSequence("{0}", "0000");
+testStringUnicodeEscapeSequence("{41}", "0041");
+testStringUnicodeEscapeSequence("{D800}", "D800");
+testStringUnicodeEscapeSequence("{d800}", "D800");
+testStringUnicodeEscapeSequence("{DC00}", "DC00");
+testStringUnicodeEscapeSequence("{dc00}", "DC00");
+testStringUnicodeEscapeSequence("{FFFF}", "FFFF");
+testStringUnicodeEscapeSequence("{ffff}", "FFFF");
+testStringUnicodeEscapeSequence("{10000}", "D800,DC00");
+testStringUnicodeEscapeSequence("{10001}", "D800,DC01");
+testStringUnicodeEscapeSequence("{102C0}", "D800,DEC0");
+testStringUnicodeEscapeSequence("{102c0}", "D800,DEC0");
+testStringUnicodeEscapeSequence("{1D306}", "D834,DF06");
+testStringUnicodeEscapeSequence("{1d306}", "D834,DF06");
+testStringUnicodeEscapeSequence("{10FFFE}", "DBFF,DFFE");
+testStringUnicodeEscapeSequence("{10fffe}", "DBFF,DFFE");
+testStringUnicodeEscapeSequence("{10FFFF}", "DBFF,DFFF");
+testStringUnicodeEscapeSequence("{10ffff}", "DBFF,DFFF");
+testStringUnicodeEscapeSequence("{00000000000000000000000010FFFF}", "DBFF,DFFF");
+testStringUnicodeEscapeSequence("{00000000000000000000000010ffff}", "DBFF,DFFF");
+
+testInvalidStringUnicodeEscapeSequence("x");
+testInvalidStringUnicodeEscapeSequence("{");
+testInvalidStringUnicodeEscapeSequence("{}");
+testInvalidStringUnicodeEscapeSequence("{G}");
+testInvalidStringUnicodeEscapeSequence("{1G}");
+testInvalidStringUnicodeEscapeSequence("{110000}");
+testInvalidStringUnicodeEscapeSequence("{1000000}");
+testInvalidStringUnicodeEscapeSequence("{100000000000000000000000}");
+
+testIdentifierStartUnicodeEscapeSequence("{41}", "0041");
+testIdentifierStartUnicodeEscapeSequence("{102C0}", "D800,DEC0");
+testIdentifierStartUnicodeEscapeSequence("{102c0}", "D800,DEC0");
+testIdentifierStartUnicodeEscapeSequence("{1D306}", "D834,DF06");
+testIdentifierStartUnicodeEscapeSequence("{1d306}", "D834,DF06");
+
+testInvalidIdentifierStartUnicodeEscapeSequence("");
+testInvalidIdentifierStartUnicodeEscapeSequence("{0}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{D800}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{d800}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{DC00}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{dc00}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{FFFF}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{ffff}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10000}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10001}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10FFFE}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10fffe}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10FFFF}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{10ffff}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{00000000000000000000000010FFFF}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{00000000000000000000000010ffff}");
+
+testInvalidIdentifierStartUnicodeEscapeSequence("x");
+testInvalidIdentifierStartUnicodeEscapeSequence("{");
+testInvalidIdentifierStartUnicodeEscapeSequence("{}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{G}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{1G}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{110000}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{1000000}");
+testInvalidIdentifierStartUnicodeEscapeSequence("{100000000000000000000000}");
+
+testIdentifierPartUnicodeEscapeSequence("{41}", "0041");
+testIdentifierPartUnicodeEscapeSequence("{10000}", "D800,DC00");
+testIdentifierPartUnicodeEscapeSequence("{10001}", "D800,DC01");
+testIdentifierPartUnicodeEscapeSequence("{102C0}", "D800,DEC0");
+testIdentifierPartUnicodeEscapeSequence("{102c0}", "D800,DEC0");
+
+testInvalidIdentifierPartUnicodeEscapeSequence("");
+testInvalidIdentifierPartUnicodeEscapeSequence("{0}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{D800}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{d800}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{DC00}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{dc00}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{FFFF}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{ffff}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{1D306}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{1d306}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{10FFFE}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{10fffe}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{10FFFF}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{10ffff}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{00000000000000000000000010FFFF}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{00000000000000000000000010ffff}");
+
+testInvalidIdentifierPartUnicodeEscapeSequence("x");
+testInvalidIdentifierPartUnicodeEscapeSequence("{");
+testInvalidIdentifierPartUnicodeEscapeSequence("{}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{G}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{1G}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{110000}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{1000000}");
+testInvalidIdentifierPartUnicodeEscapeSequence("{100000000000000000000000}");
+
+var successfullyParsed = true;
</ins></span></pre></div>
<a id="trunkLayoutTestsjsunicodeescapesequencesexpectedtxt"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/js/unicode-escape-sequences-expected.txt (0 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/unicode-escape-sequences-expected.txt         (rev 0)
+++ trunk/LayoutTests/js/unicode-escape-sequences-expected.txt        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -0,0 +1,96 @@
</span><ins>+Test of Unicode escape sequences in string literals and identifiers, especially code point escape sequences.
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS codeUnits("\u") is "0075"
+PASS codeUnits("\u{0}") is "0000"
+PASS codeUnits("\u{41}") is "0041"
+PASS codeUnits("\u{D800}") is "D800"
+PASS codeUnits("\u{d800}") is "D800"
+PASS codeUnits("\u{DC00}") is "DC00"
+PASS codeUnits("\u{dc00}") is "DC00"
+PASS codeUnits("\u{FFFF}") is "FFFF"
+PASS codeUnits("\u{ffff}") is "FFFF"
+PASS codeUnits("\u{10000}") is "D800,DC00"
+PASS codeUnits("\u{10001}") is "D800,DC01"
+PASS codeUnits("\u{102C0}") is "D800,DEC0"
+PASS codeUnits("\u{102c0}") is "D800,DEC0"
+PASS codeUnits("\u{1D306}") is "D834,DF06"
+PASS codeUnits("\u{1d306}") is "D834,DF06"
+PASS codeUnits("\u{10FFFE}") is "DBFF,DFFE"
+PASS codeUnits("\u{10fffe}") is "DBFF,DFFE"
+PASS codeUnits("\u{10FFFF}") is "DBFF,DFFF"
+PASS codeUnits("\u{10ffff}") is "DBFF,DFFF"
+PASS codeUnits("\u{00000000000000000000000010FFFF}") is "DBFF,DFFF"
+PASS codeUnits("\u{00000000000000000000000010ffff}") is "DBFF,DFFF"
+PASS codeUnits("\ux") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{G}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{1G}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{110000}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{1000000}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits("\u{100000000000000000000000}") threw exception SyntaxError: \u can only be followed by a Unicode character sequence.
+PASS codeUnits(function \u{41}(){}.name) is "0041"
+PASS codeUnits(function \u{102C0}(){}.name) is "D800,DEC0"
+PASS codeUnits(function \u{102c0}(){}.name) is "D800,DEC0"
+PASS codeUnits(function \u{1D306}(){}.name) is "D834,DF06"
+PASS codeUnits(function \u{1d306}(){}.name) is "D834,DF06"
+PASS codeUnits(function \u(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u'.
+PASS codeUnits(function \u{0}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{0}'.
+PASS codeUnits(function \u{D800}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{D800}'.
+PASS codeUnits(function \u{d800}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{d800}'.
+PASS codeUnits(function \u{DC00}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{DC00}'.
+PASS codeUnits(function \u{dc00}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{dc00}'.
+PASS codeUnits(function \u{FFFF}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{FFFF}'.
+PASS codeUnits(function \u{ffff}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{ffff}'.
+PASS codeUnits(function \u{10000}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10000}'.
+PASS codeUnits(function \u{10001}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10001}'.
+PASS codeUnits(function \u{10FFFE}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10FFFE}'.
+PASS codeUnits(function \u{10fffe}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10fffe}'.
+PASS codeUnits(function \u{10FFFF}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10FFFF}'.
+PASS codeUnits(function \u{10ffff}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10ffff}'.
+PASS codeUnits(function \u{00000000000000000000000010FFFF}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{00000000000000000000000010FFFF}'.
+PASS codeUnits(function \u{00000000000000000000000010ffff}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{00000000000000000000000010ffff}'.
+PASS codeUnits(function \ux(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u'.
+PASS codeUnits(function \u{(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{'.
+PASS codeUnits(function \u{}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{'.
+PASS codeUnits(function \u{G}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{'.
+PASS codeUnits(function \u{1G}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{1'.
+PASS codeUnits(function \u{110000}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{11000'.
+PASS codeUnits(function \u{1000000}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{100000'.
+PASS codeUnits(function \u{100000000000000000000000}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{100000'.
+PASS codeUnits(function x\u{41}(){}.name.substring(1)) is "0041"
+PASS codeUnits(function x\u{10000}(){}.name.substring(1)) is "D800,DC00"
+PASS codeUnits(function x\u{10001}(){}.name.substring(1)) is "D800,DC01"
+PASS codeUnits(function x\u{102C0}(){}.name.substring(1)) is "D800,DEC0"
+PASS codeUnits(function x\u{102c0}(){}.name.substring(1)) is "D800,DEC0"
+PASS codeUnits(function x\u(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u'.
+PASS codeUnits(function x\u{0}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{0}'.
+PASS codeUnits(function x\u{D800}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{D800}'.
+PASS codeUnits(function x\u{d800}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{d800}'.
+PASS codeUnits(function x\u{DC00}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{DC00}'.
+PASS codeUnits(function x\u{dc00}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{dc00}'.
+PASS codeUnits(function x\u{FFFF}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{FFFF}'.
+PASS codeUnits(function x\u{ffff}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{ffff}'.
+PASS codeUnits(function x\u{1D306}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{1D306}'.
+PASS codeUnits(function x\u{1d306}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{1d306}'.
+PASS codeUnits(function x\u{10FFFE}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{10FFFE}'.
+PASS codeUnits(function x\u{10fffe}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{10fffe}'.
+PASS codeUnits(function x\u{10FFFF}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{10FFFF}'.
+PASS codeUnits(function x\u{10ffff}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{10ffff}'.
+PASS codeUnits(function x\u{00000000000000000000000010FFFF}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{00000000000000000000000010FFFF}'.
+PASS codeUnits(function x\u{00000000000000000000000010ffff}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{00000000000000000000000010ffff}'.
+PASS codeUnits(function x\ux(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u'.
+PASS codeUnits(function x\u{(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{'.
+PASS codeUnits(function x\u{}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{'.
+PASS codeUnits(function x\u{G}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{'.
+PASS codeUnits(function x\u{1G}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{1'.
+PASS codeUnits(function x\u{110000}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{11000'.
+PASS codeUnits(function x\u{1000000}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{100000'.
+PASS codeUnits(function x\u{100000000000000000000000}(){}.name.substring(1)) threw exception SyntaxError: Invalid unicode escape in identifier: 'x\u{100000'.
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
</ins><span class="cx">Property changes on: trunk/LayoutTests/js/unicode-escape-sequences-expected.txt
</span><span class="cx">___________________________________________________________________
</span></span></pre></div>
<a id="svneolstyle"></a>
<div class="addfile"><h4>Added: svn:eol-style</h4></div>
<a id="trunkLayoutTestsjsunicodeescapesequenceshtml"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/js/unicode-escape-sequences.html (0 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/unicode-escape-sequences.html         (rev 0)
+++ trunk/LayoutTests/js/unicode-escape-sequences.html        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -0,0 +1,8 @@
</span><ins>+<!DOCTYPE html>
+<html>
+<body>
+<script src="../resources/js-test-pre.js"></script>
+<script src="script-tests/unicode-escape-sequences.js"></script>
+<script src="../resources/js-test-post.js"></script>
+</body>
+</html>
</ins><span class="cx">Property changes on: trunk/LayoutTests/js/unicode-escape-sequences.html
</span><span class="cx">___________________________________________________________________
</span></span></pre></div>
<a id="svnmimetype"></a>
<div class="addfile"><h4>Added: svn:mime-type</h4></div>
<a id="svneolstyle"></a>
<div class="addfile"><h4>Added: svn:eol-style</h4></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/JavaScriptCore/ChangeLog        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -1,3 +1,45 @@
</span><ins>+2015-04-29 Darin Adler <darin@apple.com>
+
+ [ES6] Implement Unicode code point escapes
+ https://bugs.webkit.org/show_bug.cgi?id=144377
+
+ Reviewed by Antti Koivisto.
+
+ * parser/Lexer.cpp: Moved the UnicodeHexValue class in here from
+ the header. Made it a non-member class so it doesn't need to be part
+ of a template. Made it use UChar32 instead of int for the value to
+ make it clearer what goes into this class.
+ (JSC::ParsedUnicodeEscapeValue::isIncomplete): Added. Replaces the
+ old type() function.
+ (JSC::Lexer<CharacterType>::parseUnicodeEscape): Renamed from
+ parseFourDigitUnicodeHex and added support for code point escapes.
+ (JSC::isLatin1): Added an overload for UChar32.
+ (JSC::isIdentStart): Changed this to take UChar32; no caller tries
+ to call it with a UChar, so no need to overload for that type for now.
+ (JSC::isNonLatin1IdentPart): Changed argument type to UChar32 for clarity.
+ Also added FIXME about a subtle ES6 change that we might want to make later.
+ (JSC::isIdentPart): Changed this to take UChar32; no caller tries
+ to call it with a UChar, so no need to overload for that type for now.
+ (JSC::isIdentPartIncludingEscapeTemplate): Made this a template so that we
+ don't need to repeat the code twice. Added code to handle code point escapes.
+ (JSC::isIdentPartIncludingEscape): Call the template instead of having the
+ code in line.
+ (JSC::Lexer<CharacterType>::recordUnicodeCodePoint): Added.
+ (JSC::Lexer<CharacterType>::parseIdentifierSlowCase): Made small tweaks and
+ updated to call parseUnicodeEscape instead of parseFourDigitUnicodeHex.
+ (JSC::Lexer<CharacterType>::parseComplexEscape): Call parseUnicodeEscape
+ instead of parseFourDigitUnicodeHex. Move the code to handle "\u" before
+ the code that handles the escapes, since the code point escape code now
+ consumes characters while parsing rather than peeking ahead. Test case
+ covers this: Symptom would be that "\u{" would evaluate to "u" instead of
+ giving a syntax error.
+
+ * parser/Lexer.h: Updated for above changes.
+
+ * runtime/StringConstructor.cpp:
+ (JSC::stringFromCodePoint): Use ICU's UCHAR_MAX_VALUE instead of writing
+ out 0x10FFFF; clearer this way.
+
</ins><span class="cx"> 2015-04-29 Martin Robinson <mrobinson@igalia.com>
</span><span class="cx">
</span><span class="cx"> [CMake] [GTK] Organize and clean up unused CMake variables
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreparserLexercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/parser/Lexer.cpp (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/parser/Lexer.cpp        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/JavaScriptCore/parser/Lexer.cpp        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -610,22 +610,60 @@
</span><span class="cx"> return (code < m_codeEnd) ? *code : 0;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-template <typename T>
-typename Lexer<T>::UnicodeHexValue Lexer<T>::parseFourDigitUnicodeHex()
</del><ins>+struct ParsedUnicodeEscapeValue {
+ ParsedUnicodeEscapeValue(UChar32 value)
+ : m_value(value)
+ {
+ ASSERT(isValid());
+ }
+
+ enum SpecialValueType { Incomplete = -2, Invalid = -1 };
+ ParsedUnicodeEscapeValue(SpecialValueType type)
+ : m_value(type)
+ {
+ }
+
+ bool isValid() const { return m_value >= 0; }
+ bool isIncomplete() const { return m_value == Incomplete; }
+
+ UChar32 value() const
+ {
+ ASSERT(isValid());
+ return m_value;
+ }
+
+private:
+ UChar32 m_value;
+};
+
+template<typename CharacterType> ParsedUnicodeEscapeValue Lexer<CharacterType>::parseUnicodeEscape()
</ins><span class="cx"> {
</span><del>- T char1 = peek(1);
- T char2 = peek(2);
- T char3 = peek(3);
</del><ins>+ if (m_current == '{') {
+ shift();
+ UChar32 codePoint = 0;
+ do {
+ if (!isASCIIHexDigit(m_current))
+ return m_current ? ParsedUnicodeEscapeValue::Invalid : ParsedUnicodeEscapeValue::Incomplete;
+ codePoint = (codePoint << 4) | toASCIIHexValue(m_current);
+ if (codePoint > UCHAR_MAX_VALUE)
+ return ParsedUnicodeEscapeValue::Invalid;
+ shift();
+ } while (m_current != '}');
+ shift();
+ return codePoint;
+ }
</ins><span class="cx">
</span><del>- if (UNLIKELY(!isASCIIHexDigit(m_current) || !isASCIIHexDigit(char1) || !isASCIIHexDigit(char2) || !isASCIIHexDigit(char3)))
- return UnicodeHexValue((m_code + 4) >= m_codeEnd ? UnicodeHexValue::IncompleteHex : UnicodeHexValue::InvalidHex);
-
- int result = convertUnicode(m_current, char1, char2, char3);
</del><ins>+ auto character2 = peek(1);
+ auto character3 = peek(2);
+ auto character4 = peek(3);
+ if (UNLIKELY(!isASCIIHexDigit(m_current) || !isASCIIHexDigit(character2) || !isASCIIHexDigit(character3) || !isASCIIHexDigit(character4)))
+ return (m_code + 4) >= m_codeEnd ? ParsedUnicodeEscapeValue::Incomplete : ParsedUnicodeEscapeValue::Invalid;
+ auto result = convertUnicode(m_current, character2, character3, character4);
</ins><span class="cx"> shift();
</span><span class="cx"> shift();
</span><span class="cx"> shift();
</span><span class="cx"> shift();
</span><del>- return UnicodeHexValue(result);
</del><ins>+ return result;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> template <typename T>
</span><span class="lines">@@ -665,18 +703,24 @@
</span><span class="cx"> return c < 256;
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+static ALWAYS_INLINE bool isLatin1(UChar32 c)
+{
+ return !(c & ~0xFF);
+}
+
</ins><span class="cx"> static inline bool isIdentStart(LChar c)
</span><span class="cx"> {
</span><span class="cx"> return typesOfLatin1Characters[c] == CharacterIdentifierStart;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-static inline bool isIdentStart(UChar c)
</del><ins>+static inline bool isIdentStart(UChar32 c)
</ins><span class="cx"> {
</span><span class="cx"> return isLatin1(c) ? isIdentStart(static_cast<LChar>(c)) : isNonLatin1IdentStart(c);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-static NEVER_INLINE bool isNonLatin1IdentPart(int c)
</del><ins>+static NEVER_INLINE bool isNonLatin1IdentPart(UChar32 c)
</ins><span class="cx"> {
</span><ins>+ // FIXME: ES6 says this should be based on the Unicode property ID_Continue now instead.
</ins><span class="cx"> return (U_GET_GC_MASK(c) & (U_GC_L_MASK | U_GC_MN_MASK | U_GC_MC_MASK | U_GC_ND_MASK | U_GC_PC_MASK)) || c == 0x200C || c == 0x200D;
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -688,39 +732,59 @@
</span><span class="cx"> return typesOfLatin1Characters[c] <= CharacterNumber;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-static ALWAYS_INLINE bool isIdentPart(UChar c)
</del><ins>+static ALWAYS_INLINE bool isIdentPart(UChar32 c)
</ins><span class="cx"> {
</span><span class="cx"> return isLatin1(c) ? isIdentPart(static_cast<LChar>(c)) : isNonLatin1IdentPart(c);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-template <typename T>
-bool isUnicodeEscapeIdentPart(const T* code)
</del><ins>+static ALWAYS_INLINE bool isIdentPart(UChar c)
</ins><span class="cx"> {
</span><del>- T char1 = code[0];
- T char2 = code[1];
- T char3 = code[2];
- T char4 = code[3];
-
- if (!isASCIIHexDigit(char1) || !isASCIIHexDigit(char2) || !isASCIIHexDigit(char3) || !isASCIIHexDigit(char4))
- return false;
-
- return isIdentPart(Lexer<T>::convertUnicode(char1, char2, char3, char4));
</del><ins>+ return isIdentPart(static_cast<UChar32>(c));
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-static ALWAYS_INLINE bool isIdentPartIncludingEscape(const LChar* code, const LChar* codeEnd)
</del><ins>+template<typename CharacterType> ALWAYS_INLINE bool isIdentPartIncludingEscapeTemplate(const CharacterType* code, const CharacterType* codeEnd)
</ins><span class="cx"> {
</span><del>- if (isIdentPart(*code))
</del><ins>+ if (isIdentPart(code[0]))
</ins><span class="cx"> return true;
</span><span class="cx">
</span><del>- return (*code == '\\' && ((codeEnd - code) >= 6) && code[1] == 'u' && isUnicodeEscapeIdentPart(code+2));
</del><ins>+ // Shortest sequence handled below is \u{0}, which is 5 characters.
+ if (!(code[0] == '\\' && codeEnd - code >= 5 && code[1] == 'u'))
+ return false;
+
+ if (code[2] == '{') {
+ UChar32 codePoint = 0;
+ const CharacterType* pointer;
+ for (pointer = &code[3]; pointer < codeEnd; ++pointer) {
+ auto digit = *pointer;
+ if (!isASCIIHexDigit(digit))
+ break;
+ codePoint = (codePoint << 4) | toASCIIHexValue(digit);
+ if (codePoint > UCHAR_MAX_VALUE)
+ return false;
+ }
+ return isIdentPart(codePoint) && pointer < codeEnd && *pointer == '}';
+ }
+
+ // Shortest sequence handled below is \uXXXX, which is 6 characters.
+ if (codeEnd - code < 6)
+ return false;
+
+ auto character1 = code[2];
+ auto character2 = code[3];
+ auto character3 = code[4];
+ auto character4 = code[5];
+ return isASCIIHexDigit(character1) && isASCIIHexDigit(character2) && isASCIIHexDigit(character3) && isASCIIHexDigit(character4)
+ && isIdentPart(Lexer<LChar>::convertUnicode(character1, character2, character3, character4));
</ins><span class="cx"> }
</span><span class="cx">
</span><ins>+static ALWAYS_INLINE bool isIdentPartIncludingEscape(const LChar* code, const LChar* codeEnd)
+{
+ return isIdentPartIncludingEscapeTemplate(code, codeEnd);
+}
+
</ins><span class="cx"> static ALWAYS_INLINE bool isIdentPartIncludingEscape(const UChar* code, const UChar* codeEnd)
</span><span class="cx"> {
</span><del>- if (isIdentPart(*code))
- return true;
-
- return (*code == '\\' && ((codeEnd - code) >= 6) && code[1] == 'u' && isUnicodeEscapeIdentPart(code+2));
</del><ins>+ return isIdentPartIncludingEscapeTemplate(code, codeEnd);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> static inline LChar singleEscape(int c)
</span><span class="lines">@@ -799,6 +863,18 @@
</span><span class="cx"> m_buffer16.append(static_cast<UChar>(c));
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+template<typename CharacterType> inline void Lexer<CharacterType>::recordUnicodeCodePoint(UChar32 codePoint)
+{
+ ASSERT(codePoint >= 0);
+ ASSERT(codePoint <= UCHAR_MAX_VALUE);
+ if (U_IS_BMP(codePoint))
+ record16(codePoint);
+ else {
+ UChar codeUnits[2] = { U16_LEAD(codePoint), U16_TRAIL(codePoint) };
+ append16(codeUnits, 2);
+ }
+}
+
</ins><span class="cx"> #if !ASSERT_DISABLED
</span><span class="cx"> bool isSafeBuiltinIdentifier(VM& vm, const Identifier* ident)
</span><span class="cx"> {
</span><span class="lines">@@ -807,6 +883,7 @@
</span><span class="cx"> /* Just block any use of suspicious identifiers. This is intended to
</span><span class="cx"> * be used as a safety net while implementing builtins.
</span><span class="cx"> */
</span><ins>+ // FIXME: How can a debug-only assertion be a safety net?
</ins><span class="cx"> if (*ident == vm.propertyNames->builtinNames().callPublicName())
</span><span class="cx"> return false;
</span><span class="cx"> if (*ident == vm.propertyNames->builtinNames().applyPublicName())
</span><span class="lines">@@ -960,11 +1037,10 @@
</span><span class="cx"> return IDENT;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-template <typename T>
-template <bool shouldCreateIdentifier> JSTokenType Lexer<T>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexerFlags, bool strictMode)
</del><ins>+template<typename CharacterType> template<bool shouldCreateIdentifier> JSTokenType Lexer<CharacterType>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexerFlags, bool strictMode)
</ins><span class="cx"> {
</span><span class="cx"> const ptrdiff_t remaining = m_codeEnd - m_code;
</span><del>- const T* identifierStart = currentSourcePtr();
</del><ins>+ auto identifierStart = currentSourcePtr();
</ins><span class="cx"> bool bufferRequired = false;
</span><span class="cx">
</span><span class="cx"> while (true) {
</span><span class="lines">@@ -983,19 +1059,18 @@
</span><span class="cx"> if (UNLIKELY(m_current != 'u'))
</span><span class="cx"> return atEnd() ? UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_ESCAPE_ERRORTOK;
</span><span class="cx"> shift();
</span><del>- UnicodeHexValue character = parseFourDigitUnicodeHex();
</del><ins>+ auto character = parseUnicodeEscape();
</ins><span class="cx"> if (UNLIKELY(!character.isValid()))
</span><del>- return character.valueType() == UnicodeHexValue::IncompleteHex ? UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK;
- UChar ucharacter = static_cast<UChar>(character.value());
- if (UNLIKELY(m_buffer16.size() ? !isIdentPart(ucharacter) : !isIdentStart(ucharacter)))
</del><ins>+ return character.isIncomplete() ? UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK;
+ if (UNLIKELY(m_buffer16.size() ? !isIdentPart(character.value()) : !isIdentStart(character.value())))
</ins><span class="cx"> return INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK;
</span><span class="cx"> if (shouldCreateIdentifier)
</span><del>- record16(ucharacter);
</del><ins>+ recordUnicodeCodePoint(character.value());
</ins><span class="cx"> identifierStart = currentSourcePtr();
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> int identifierLength;
</span><del>- const Identifier* ident = 0;
</del><ins>+ const Identifier* ident = nullptr;
</ins><span class="cx"> if (shouldCreateIdentifier) {
</span><span class="cx"> if (!bufferRequired) {
</span><span class="cx"> identifierLength = currentSourcePtr() - identifierStart;
</span><span class="lines">@@ -1008,7 +1083,7 @@
</span><span class="cx">
</span><span class="cx"> tokenData->ident = ident;
</span><span class="cx"> } else
</span><del>- tokenData->ident = 0;
</del><ins>+ tokenData->ident = nullptr;
</ins><span class="cx">
</span><span class="cx"> if (LIKELY(!bufferRequired && !(lexerFlags & LexerFlagsIgnoreReservedWords))) {
</span><span class="cx"> ASSERT(shouldCreateIdentifier);
</span><span class="lines">@@ -1125,21 +1200,22 @@
</span><span class="cx">
</span><span class="cx"> if (m_current == 'u') {
</span><span class="cx"> shift();
</span><del>- UnicodeHexValue character = parseFourDigitUnicodeHex();
- if (character.isValid()) {
</del><ins>+
+ if (escapeParseMode == EscapeParseMode::String && m_current == stringQuoteCharacter) {
</ins><span class="cx"> if (shouldBuildStrings)
</span><del>- record16(character.value());
</del><ins>+ record16('u');
</ins><span class="cx"> return StringParsedSuccessfully;
</span><span class="cx"> }
</span><span class="cx">
</span><del>- if (escapeParseMode == EscapeParseMode::String && m_current == stringQuoteCharacter) {
</del><ins>+ auto character = parseUnicodeEscape();
+ if (character.isValid()) {
</ins><span class="cx"> if (shouldBuildStrings)
</span><del>- record16('u');
</del><ins>+ recordUnicodeCodePoint(character.value());
</ins><span class="cx"> return StringParsedSuccessfully;
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> m_lexErrorMessage = ASCIILiteral("\\u can only be followed by a Unicode character sequence");
</span><del>- return character.valueType() == UnicodeHexValue::IncompleteHex ? StringUnterminated : StringCannotBeParsed;
</del><ins>+ return character.isIncomplete() ? StringUnterminated : StringCannotBeParsed;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> if (strictMode) {
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreparserLexerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/parser/Lexer.h (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/parser/Lexer.h        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/JavaScriptCore/parser/Lexer.h        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -65,6 +65,8 @@
</span><span class="cx"> LexexFlagsDontBuildKeywords = 4
</span><span class="cx"> };
</span><span class="cx">
</span><ins>+struct ParsedUnicodeEscapeValue;
+
</ins><span class="cx"> template <typename T>
</span><span class="cx"> class Lexer {
</span><span class="cx"> WTF_MAKE_NONCOPYABLE(Lexer);
</span><span class="lines">@@ -138,42 +140,15 @@
</span><span class="cx"> void append8(const T*, size_t);
</span><span class="cx"> void record16(int);
</span><span class="cx"> void record16(T);
</span><ins>+ void recordUnicodeCodePoint(UChar32);
</ins><span class="cx"> void append16(const LChar*, size_t);
</span><span class="cx"> void append16(const UChar* characters, size_t length) { m_buffer16.append(characters, length); }
</span><span class="cx">
</span><span class="cx"> ALWAYS_INLINE void shift();
</span><span class="cx"> ALWAYS_INLINE bool atEnd() const;
</span><span class="cx"> ALWAYS_INLINE T peek(int offset) const;
</span><del>- struct UnicodeHexValue {
-
- enum ValueType { ValidHex, IncompleteHex, InvalidHex };
-
- explicit UnicodeHexValue(int value)
- : m_value(value)
- {
- }
- explicit UnicodeHexValue(ValueType type)
- : m_value(type == IncompleteHex ? -2 : -1)
- {
- }
</del><span class="cx">
</span><del>- ValueType valueType() const
- {
- if (m_value >= 0)
- return ValidHex;
- return m_value == -2 ? IncompleteHex : InvalidHex;
- }
- bool isValid() const { return m_value >= 0; }
- int value() const
- {
- ASSERT(m_value >= 0);
- return m_value;
- }
-
- private:
- int m_value;
- };
- UnicodeHexValue parseFourDigitUnicodeHex();
</del><ins>+ ParsedUnicodeEscapeValue parseUnicodeEscape();
</ins><span class="cx"> void shiftLineTerminator();
</span><span class="cx">
</span><span class="cx"> ALWAYS_INLINE int offsetFromSourcePtr(const T* ptr) const { return ptr - m_codeStart; }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeStringConstructorcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/StringConstructor.cpp (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/StringConstructor.cpp        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/JavaScriptCore/runtime/StringConstructor.cpp        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -105,7 +105,7 @@
</span><span class="cx">
</span><span class="cx"> uint32_t codePoint = static_cast<uint32_t>(codePointAsDouble);
</span><span class="cx">
</span><del>- if (codePoint != codePointAsDouble || codePoint > 0x10FFFF)
</del><ins>+ if (codePoint != codePointAsDouble || codePoint > UCHAR_MAX_VALUE)
</ins><span class="cx"> return throwVMError(exec, createRangeError(exec, ASCIILiteral("Arguments contain a value that is out of range of code points")));
</span><span class="cx">
</span><span class="cx"> if (U_IS_BMP(codePoint))
</span></span></pre></div>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/WebCore/ChangeLog        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -1,3 +1,34 @@
</span><ins>+2015-04-29 Darin Adler <darin@apple.com>
+
+ [ES6] Implement Unicode code point escapes
+ https://bugs.webkit.org/show_bug.cgi?id=144377
+
+ Reviewed by Antti Koivisto.
+
+ Test: js/unicode-escape-sequences.html
+
+ * css/CSSParser.cpp:
+ (WebCore::CSSParser::parseEscape): Use ICU's UCHAR_MAX_VALUE instead of writing
+ out 0x10FFFF; clearer this way. Also use our replacementCharacter instead of
+ writing out 0xFFFD.
+
+ * html/parser/HTMLEntityParser.cpp:
+ (WebCore::isAlphaNumeric): Deleted.
+ (WebCore::HTMLEntityParser::legalEntityFor): Use ICU's UCHAR_MAX_VALUE and
+ U_IS_SURROGATE instead of writing the code out. Didn't use U_IS_UNICODE_CHAR
+ because that also includes U_IS_UNICODE_NONCHAR and thus would change behavior,
+ but maye it's something we want to do in the future.
+ (WebCore::HTMLEntityParser::consumeNamedEntity): Use isASCIIAlphanumeric instead
+ of a the function in this file that does the same thing less efficiently.
+
+ * html/parser/InputStreamPreprocessor.h:
+ (WebCore::InputStreamPreprocessor::processNextInputCharacter): Use
+ replacementCharacter from CharacterNames.h instead of writing out 0xFFFd.
+
+ * xml/parser/CharacterReferenceParserInlines.h:
+ (WebCore::consumeCharacterReference): Use ICU's UCHAR_MAX_VALUE instead of
+ defining our own local highestValidCharacter constant.
+
</ins><span class="cx"> 2015-04-29 Martin Robinson <mrobinson@igalia.com>
</span><span class="cx">
</span><span class="cx"> [CMake] [GTK] Organize and clean up unused CMake variables
</span></span></pre></div>
<a id="trunkSourceWebCorecssCSSParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/css/CSSParser.cpp (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/css/CSSParser.cpp        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/WebCore/css/CSSParser.cpp        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -10762,9 +10762,8 @@
</span><span class="cx"> unicode = (unicode << 4) + toASCIIHexValue(*src++);
</span><span class="cx"> } while (--length && isASCIIHexDigit(*src));
</span><span class="cx">
</span><del>- // Characters above 0x10ffff are not handled.
- if (unicode > 0x10ffff)
- unicode = 0xfffd;
</del><ins>+ if (unicode > UCHAR_MAX_VALUE)
+ unicode = replacementCharacter;
</ins><span class="cx">
</span><span class="cx"> // Optional space after the escape sequence.
</span><span class="cx"> if (isHTMLSpace(*src))
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLEntityParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -32,9 +32,8 @@
</span><span class="cx"> #include "HTMLEntitySearch.h"
</span><span class="cx"> #include "HTMLEntityTable.h"
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><ins>+#include <wtf/unicode/CharacterNames.h>
</ins><span class="cx">
</span><del>-using namespace WTF;
-
</del><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="cx"> static const UChar windowsLatin1ExtensionArray[32] = {
</span><span class="lines">@@ -44,17 +43,12 @@
</span><span class="cx"> 0x02DC, 0x2122, 0x0161, 0x203A, 0x0153, 0x009D, 0x017E, 0x0178, // 98-9F
</span><span class="cx"> };
</span><span class="cx">
</span><del>-static inline bool isAlphaNumeric(UChar cc)
-{
- return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z');
-}
-
</del><span class="cx"> class HTMLEntityParser {
</span><span class="cx"> public:
</span><span class="cx"> static UChar32 legalEntityFor(UChar32 value)
</span><span class="cx"> {
</span><del>- if (value <= 0 || value > 0x10FFFF || (value >= 0xD800 && value <= 0xDFFF))
- return 0xFFFD;
</del><ins>+ if (value <= 0 || value > UCHAR_MAX_VALUE || U_IS_SURROGATE(value))
+ return replacementCharacter;
</ins><span class="cx"> if ((value & ~0x1F) != 0x80)
</span><span class="cx"> return value;
</span><span class="cx"> return windowsLatin1ExtensionArray[value - 0x80];
</span><span class="lines">@@ -104,7 +98,7 @@
</span><span class="cx"> }
</span><span class="cx"> if (entitySearch.mostRecentMatch()->lastCharacter() == ';'
</span><span class="cx"> || !additionalAllowedCharacter
</span><del>- || !(isAlphaNumeric(cc) || cc == '=')) {
</del><ins>+ || !(isASCIIAlphanumeric(cc) || cc == '=')) {
</ins><span class="cx"> decodedEntity.append(entitySearch.mostRecentMatch()->firstValue);
</span><span class="cx"> if (entitySearch.mostRecentMatch()->secondValue)
</span><span class="cx"> decodedEntity.append(entitySearch.mostRecentMatch()->secondValue);
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserInputStreamPreprocessorh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx">
</span><span class="cx"> #include "SegmentedString.h"
</span><span class="cx"> #include <wtf/Noncopyable.h>
</span><ins>+#include <wtf/unicode/CharacterNames.h>
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="lines">@@ -115,7 +116,7 @@
</span><span class="cx"> m_nextInputCharacter = source.currentChar();
</span><span class="cx"> goto ProcessAgain;
</span><span class="cx"> }
</span><del>- m_nextInputCharacter = 0xFFFD;
</del><ins>+ m_nextInputCharacter = replacementCharacter;
</ins><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx"> return true;
</span></span></pre></div>
<a id="trunkSourceWebCorexmlparserCharacterReferenceParserInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h (183551 => 183552)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h        2015-04-29 16:32:05 UTC (rev 183551)
+++ trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h        2015-04-29 16:33:12 UTC (rev 183552)
</span><span class="lines">@@ -54,7 +54,6 @@
</span><span class="cx"> } state = Initial;
</span><span class="cx"> UChar32 result = 0;
</span><span class="cx"> bool overflow = false;
</span><del>- const UChar32 highestValidCharacter = 0x10FFFF;
</del><span class="cx"> StringBuilder consumedCharacters;
</span><span class="cx">
</span><span class="cx"> while (!source.isEmpty()) {
</span><span class="lines">@@ -107,7 +106,7 @@
</span><span class="cx"> Hex:
</span><span class="cx"> if (isASCIIHexDigit(character)) {
</span><span class="cx"> result = result * 16 + toASCIIHexValue(character);
</span><del>- if (result > highestValidCharacter)
</del><ins>+ if (result > UCHAR_MAX_VALUE)
</ins><span class="cx"> overflow = true;
</span><span class="cx"> break;
</span><span class="cx"> }
</span><span class="lines">@@ -126,7 +125,7 @@
</span><span class="cx"> Decimal:
</span><span class="cx"> if (isASCIIDigit(character)) {
</span><span class="cx"> result = result * 10 + character - '0';
</span><del>- if (result > highestValidCharacter)
</del><ins>+ if (result > UCHAR_MAX_VALUE)
</ins><span class="cx"> overflow = true;
</span><span class="cx"> break;
</span><span class="cx"> }
</span></span></pre>
</div>
</div>
</body>
</html>