<html>
<head>
<base href="https://bugs.webkit.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [INTL] Implement Collator Compare Functions"
href="https://bugs.webkit.org/show_bug.cgi?id=147604#c18">Comment # 18</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [INTL] Implement Collator Compare Functions"
href="https://bugs.webkit.org/show_bug.cgi?id=147604">bug 147604</a>
from <span class="vcard"><a class="email" href="mailto:sukolsak@gmail.com" title="Sukolsak Sakshuwong <sukolsak@gmail.com>"> <span class="fn">Sukolsak Sakshuwong</span></a>
</span></b>
<pre>Thanks for the review.
(In reply to <a href="show_bug.cgi?id=147604#c16">comment #16</a>)
<span class="quote">> Comment on <span class="bz_obsolete"><a href="attachment.cgi?id=265024&action=diff" name="attach_265024" title="Patch">attachment 265024</a> <a href="attachment.cgi?id=265024&action=edit" title="Patch">[details]</a></span>
> Patch
>
> View in context:
> <a href="https://bugs.webkit.org/attachment.cgi?id=265024&action=review">https://bugs.webkit.org/attachment.cgi?id=265024&action=review</a>
>
> Looks good.
>
> We need some performance tests here. This code is highly likely to be
> super-slow because the memory allocation; much of it is unnecessary.
> Performance tests will help us make the right decisions about further
> optimization.</span >
I have removed some unnecessary memory allocation in this patch. Could you please let me know where is the right place to add performance tests for this? Is it JavaScriptCore/tests/stress?
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:91
> > +static Vector<String> sortLocaleData(const String& locale, const String& key)
>
> I don’t like the interface to this function. If “key” can only be “co” or
> “kn”, then why are we passing a string, and not something more like an enum?
> It’s not good to have a function that takes a string, but only accepts a
> limited set of values, and requires the caller to do that checking (since it
> does ASSERT_NOT_REACHED).</span >
Fixed by using array indices. I'm not sure if it's the best way to do it. I tried using enums but found the code too ugly, because resolveLocale() needs to work with other classes. So I had to cast enum to int back and forth and pass a function for converting enums to strings to resolveLocale().
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:120
> > + String collation(keywordValue);
> > +
> > + // 10.2.3 "The values "standard" and "search" must not be used as elements in any [[sortLocaleData]][locale].co and [[searchLocaleData]][locale].co array."
> > + if (collation == "standard" || collation == "search")
> > + continue;
> > +
> > + // Map keyword values to BCP 47 equivalents.
> > + if (collation == "dictionary")
> > + collation = ASCIILiteral("dict");
> > + else if (collation == "gb2312han")
> > + collation = ASCIILiteral("gb2312");
> > + else if (collation == "phonebook")
> > + collation = ASCIILiteral("phonebk");
> > + else if (collation == "traditional")
> > + collation = ASCIILiteral("trad");
> > +
> > + keyLocaleData.append(collation);
>
> Would be nice to optimize this to only create a String after determining
> that it’s what we want to append. We can use strcmp on keywordValue instead
> of converting to a String and using ==.</span >
Done.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:126
> > + keyLocaleData.append(ASCIILiteral("false"));
> > + keyLocaleData.append(ASCIILiteral("true"));
>
> Should use reserveInitialCapacity and uncheckedAppend.</span >
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:138
> > + keyLocaleData.append(String());
>
> Should use reserveInitialCapacity and uncheckedAppend.</span >
Done.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> > +void IntlCollator::initializeCollator(ExecState& exec, JSValue locales, JSValue optionsValue)
>
> In new code we should name the argument “state” rather than “exec”.</span >
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:181
> > + String usageString = getIntlStringOption(&exec, options, exec.vm().propertyNames->usage, usages, "usage must be either \"sort\" or \"search\"", ASCIILiteral("sort"));
>
> WebKit coding style does not use “get” in the names of functions like this
> one.</span >
Fixed in <a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - Fix coding style of Intl code"
href="show_bug.cgi?id=151491">Bug 151491</a>.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:155
> > + m_initializedCollator = true;
>
> Why set this here instead of at the end of the function? The pseudo code
> only sets it once we are past all the possible exceptions. Do we have test
> cases covering this?</span >
Fixed. There are test cases covering this.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:157
> > + // 1. If collator has an [[initializedIntlObject]] internal slot with value true, throw a TypeError exception.
>
> Where is that step? Seems missing.</span >
I don't implement it because I don't think the condition "collator.[[initializedIntlObject]] == true" can ever be true. According to the spec, this is the only place that calls InitializeCollator():
"10.1.2 Intl.Collator([ locales [, options]])
When the Intl.Collator function is called with optional arguments locales and options the following steps are taken:
1. If NewTarget is undefined, let newTarget be the active function object, else let newTarget be NewTarget.
2. Let collator be OrdinaryCreateFromConstructor(newTarget, %CollatorPrototype%).
3. ReturnIfAbrupt(collator).
4. Return InitializeCollator(collator, locales, options)."
It seems that we always call InitializeCollator() with a new object, unless I misunderstand how OrdinaryCreateFromConstructor works. So, the object should never have the internal slot "initializedIntlObject" set.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:180
> > + const HashSet<String> usages({ ASCIILiteral("sort"), ASCIILiteral("search") });
>
> Does this really need to be a HashSet? It’s quite inefficient to create and
> destroy a HashSet every time through this function. And if the set has a
> small number of entries, HashSet is not as efficient as, say, a sorted list
> with binary search, or even a linear search through an array. We can write
> the function so it takes a std::array or a std::initializer_list and make it
> much more efficient.</span >
Fixed by using std::initializer_list.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:337
> > + m_collator = ucol_open(m_locale.utf8().data(), &status);
>
> Is it safe to do this if m_initializedCollator is false? We need test cases
> that cover this.</span >
m_initializedCollator can never be false if we reach this point. I have made it a bit more explicit in the new patch.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:338
> > + RELEASE_ASSERT(U_SUCCESS(status));
>
> Why RELEASE_ASSERT here? Seems overkill, but maybe I am missing something.</span >
Fixed by throwing an exception instead.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:342
> > + switch (m_sensitivity) {
>
> The value of m_sensitivity could be uninitialized here. What is this
> function supposed to do when initializedCollator is false?
>
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:362
> > + UColAttributeValue numericCollation = m_numeric ? UCOL_ON : UCOL_OFF;
>
> The value of m_numeric could be uninitialized here. What is this function
> supposed to do when initializedCollator is false?
> </span >
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:418
> > + options->putDirect(vm, vm.propertyNames->locale, jsString(&exec, m_locale));
>
> Will this do the right thing when m_locale is null? We need test cases that
> cover this, including all the early exit cases that can happen in
> initializeCollator.
>
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:419
> > + options->putDirect(vm, vm.propertyNames->usage, jsString(&exec, ASCIILiteral(getUsageString(m_usage))));
>
> The value of m_usage could be uninitialized here, so we could actually hit
> the ASSERT_NOT_REACHED. We need test cases that cover this, including all
> the early exit cases that can happen in initializeCollator, and I suggest
> initializing this data member in the class definition in the header.</span >
There are two ways to get a Collator object:
1. Create a Collator object with "new Intl.Collator(...)" or "Intl.Collator(...)". initializeCollator() is called when the object is created. If initializeCollator() succeeds, all these values are initialized. If an exception is thrown, the Collator object is never returned to the user. So it's not possible to read uninitialized values of Collator objects that are created this way.
2. Use Intl.Collator.prototype. The spec says "The Intl.Collator prototype object is itself an Intl.Collator instance ... whose internal slots are set as if it had been constructed by the expression Construct(%Collator%)" (ECMA-402, 10.3). However, I don't want to call initializeCollator() when Intl.Collator.prototype is created, because Intl.Collator.prototype is created when JSGlobalObject is created. So, I do lazy initialization: check whether the object has been initialized before any read of its values. If not, initialize it. There are only two methods that read the values: resolvedOptions() and compareStrings(). The new patch should make it more explicit.
So, I don't think it's possible to create a test case that reads uninitialized values. We do have test cases where we check the default options and test the default behaviors.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:366
> > + // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> > + // ignored. There is currently no way to ignore only punctuation.
>
> We need to construct test cases showing this problem, rather than just a
> FIXME.</span >
Done.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:373
> > + RELEASE_ASSERT(U_SUCCESS(status));
>
> Why RELEASE_ASSERT here? Why is a single RELEASE_ASSERT sufficient?</span >
Fixed by throwing an exception and adding multiple checks for U_FAILURE(status).
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:375
> > + return ucol_strcoll(m_collator, x.upconvertedCharacters(), x.length(), y.upconvertedCharacters(), y.length());
>
> If the strings are entirely ASCII, we could use ucol_strcollUTF8. But since
> it’s hard to efficiently check if a string is UTF-8, the right thing to do
> to optimize is probably to use ucol_strcollIter, and write a UCharIterator
> that understands how to iterate a StringView.
>
> An example of how to do this with ucol.h can be seen in the CollatorICU.cpp
> file. The Collator::collate function does all of this.</span >
Done.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:378
> > +const char* IntlCollator::getUsageString(Usage usage)
>
> WebKit coding style does not use “get” in the names of functions like this
> one.
>
> > Source/JavaScriptCore/runtime/IntlCollator.cpp:390
> > +const char* IntlCollator::getSensitivityString(Sensitivity sensitivity)
>
> WebKit coding style does not use “get” in the names of functions like this
> one.</span >
Fixed.
<span class="quote">> > Source/JavaScriptCore/runtime/IntlCollator.cpp:429
> > + m_boundCompare.set(vm, this, format);
>
> Do we have test cases covering what happens when we pass in an object of the
> wrong type?</span >
If you mean something like "new Intl.Collator().compare.call(5, 'a', 'b')", we do. See <a href="http://trac.webkit.org/browser/trunk/LayoutTests/js/script-tests/intl-collator.js#L229">http://trac.webkit.org/browser/trunk/LayoutTests/js/script-tests/intl-collator.js#L229</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>