<html>
    <head>
      <base href="https://bugs.webkit.org/" />
    </head>
    <body><span class="vcard"><a class="email" href="mailto:darin&#64;apple.com" title="Darin Adler &lt;darin&#64;apple.com&gt;"> <span class="fn">Darin Adler</span></a>
</span> changed
              <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [INTL] Implement Collator Compare Functions"
   href="https://bugs.webkit.org/show_bug.cgi?id=147604">bug 147604</a>
        <br>
             <table border="1" cellspacing="0" cellpadding="8">
          <tr>
            <th>What</th>
            <th>Removed</th>
            <th>Added</th>
          </tr>

         <tr>
           <td style="text-align:right;">Attachment #265024 Flags</td>
           <td>review?
           </td>
           <td>review+
           </td>
         </tr></table>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [INTL] Implement Collator Compare Functions"
   href="https://bugs.webkit.org/show_bug.cgi?id=147604#c16">Comment # 16</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [INTL] Implement Collator Compare Functions"
   href="https://bugs.webkit.org/show_bug.cgi?id=147604">bug 147604</a>
              from <span class="vcard"><a class="email" href="mailto:darin&#64;apple.com" title="Darin Adler &lt;darin&#64;apple.com&gt;"> <span class="fn">Darin Adler</span></a>
</span></b>
        <pre>Comment on <span class=""><a href="attachment.cgi?id=265024&amp;action=diff" name="attach_265024" title="Patch">attachment 265024</a> <a href="attachment.cgi?id=265024&amp;action=edit" title="Patch">[details]</a></span>
Patch

View in context: <a href="https://bugs.webkit.org/attachment.cgi?id=265024&amp;action=review">https://bugs.webkit.org/attachment.cgi?id=265024&amp;action=review</a>

Looks good.

We need some performance tests here. This code is highly likely to be super-slow because the memory allocation; much of it is unnecessary. Performance tests will help us make the right decisions about further optimization.

Code looks generally good but we need a lot more test coverage. I see lots of what might be minor mistakes in the code and the tests could show us if these are OK or if these are actually exposed to the JavaScript code.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:91
&gt; +static Vector&lt;String&gt; sortLocaleData(const String&amp; locale, const String&amp; key)</span >

I don’t like the interface to this function. If “key” can only be “co” or “kn”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:120
&gt; +                String collation(keywordValue);
&gt; +
&gt; +                // 10.2.3 &quot;The values &quot;standard&quot; and &quot;search&quot; must not be used as elements in any [[sortLocaleData]][locale].co and [[searchLocaleData]][locale].co array.&quot;
&gt; +                if (collation == &quot;standard&quot; || collation == &quot;search&quot;)
&gt; +                    continue;
&gt; +
&gt; +                // Map keyword values to BCP 47 equivalents.
&gt; +                if (collation == &quot;dictionary&quot;)
&gt; +                    collation = ASCIILiteral(&quot;dict&quot;);
&gt; +                else if (collation == &quot;gb2312han&quot;)
&gt; +                    collation = ASCIILiteral(&quot;gb2312&quot;);
&gt; +                else if (collation == &quot;phonebook&quot;)
&gt; +                    collation = ASCIILiteral(&quot;phonebk&quot;);
&gt; +                else if (collation == &quot;traditional&quot;)
&gt; +                    collation = ASCIILiteral(&quot;trad&quot;);
&gt; +
&gt; +                keyLocaleData.append(collation);</span >

Would be nice to optimize this to only create a String after determining that it’s what we want to append. We can use strcmp on keywordValue instead of converting to a String and using ==.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:126
&gt; +        keyLocaleData.append(ASCIILiteral(&quot;false&quot;));
&gt; +        keyLocaleData.append(ASCIILiteral(&quot;true&quot;));</span >

Should use reserveInitialCapacity and uncheckedAppend.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:132
&gt; +static Vector&lt;String&gt; searchLocaleData(const String&amp;, const String&amp; key)</span >

I don’t like the interface to this function. If “key” can only be “co”, “kn”, or “sensitivity”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:138
&gt; +        keyLocaleData.append(String());</span >

Should use reserveInitialCapacity and uncheckedAppend.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:141
&gt; +        keyLocaleData.append(ASCIILiteral(&quot;false&quot;));
&gt; +        keyLocaleData.append(ASCIILiteral(&quot;true&quot;));</span >

Should use reserveInitialCapacity and uncheckedAppend.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:144
&gt; +        keyLocaleData.append(&quot;variant&quot;);</span >

Should use reserveInitialCapacity and uncheckedAppend.

Missing ASCIILiteral here.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:150
&gt; +void IntlCollator::initializeCollator(ExecState&amp; exec, JSValue locales, JSValue optionsValue)</span >

In new code we should name the argument “state” rather than “exec”.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:155
&gt; +    m_initializedCollator = true;</span >

Why set this here instead of at the end of the function? The pseudo code only sets it once we are past all the possible exceptions. Do we have test cases covering this?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:157
&gt; +    // 1. If collator has an [[initializedIntlObject]] internal slot with value true, throw a TypeError exception.</span >

Where is that step? Seems missing.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:180
&gt; +    const HashSet&lt;String&gt; usages({ ASCIILiteral(&quot;sort&quot;), ASCIILiteral(&quot;search&quot;) });</span >

Does this really need to be a HashSet? It’s quite inefficient to create and destroy a HashSet every time through this function. And if the set has a small number of entries, HashSet is not as efficient as, say, a sorted list with binary search, or even a linear search through an array. We can write the function so it takes a std::array or a std::initializer_list and make it much more efficient.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:181
&gt; +    String usageString = getIntlStringOption(&amp;exec, options, exec.vm().propertyNames-&gt;usage, usages, &quot;usage must be either \&quot;sort\&quot; or \&quot;search\&quot;&quot;, ASCIILiteral(&quot;sort&quot;));</span >

WebKit coding style does not use “get” in the names of functions like this one.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:207
&gt; +    const HashSet&lt;String&gt; matchers({ ASCIILiteral(&quot;lookup&quot;), ASCIILiteral(&quot;best fit&quot;) });</span >

Same comment about the use of HashSet.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:237
&gt; +        const HashSet&lt;String&gt; caseFirsts({ ASCIILiteral(&quot;upper&quot;), ASCIILiteral(&quot;lower&quot;), ASCIILiteral(&quot;false&quot;) });</span >

Same comment about the use of HashSet.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:285
&gt; +    const HashSet&lt;String&gt; sensitivities({ ASCIILiteral(&quot;base&quot;), ASCIILiteral(&quot;accent&quot;), ASCIILiteral(&quot;case&quot;), ASCIILiteral(&quot;variant&quot;) });</span >

Same comment about the use of HashSet.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:337
&gt; +        m_collator = ucol_open(m_locale.utf8().data(), &amp;status);</span >

Is it safe to do this if m_initializedCollator is false? We need test cases that cover this.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:338
&gt; +        RELEASE_ASSERT(U_SUCCESS(status));</span >

Why RELEASE_ASSERT here? Seems overkill, but maybe I am missing something.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:342
&gt; +        switch (m_sensitivity) {</span >

The value of m_sensitivity could be uninitialized here. What is this function supposed to do when initializedCollator is false?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:362
&gt; +        UColAttributeValue numericCollation = m_numeric ? UCOL_ON : UCOL_OFF;</span >

The value of m_numeric could be uninitialized here. What is this function supposed to do when initializedCollator is false?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:366
&gt; +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
&gt; +        // ignored. There is currently no way to ignore only punctuation.</span >

We need to construct test cases showing this problem, rather than just a FIXME.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:367
&gt; +        UColAttributeValue alternateHandling = m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT;</span >

The value of m_ignorePunctuation could be uninitialized here. What is this function supposed to do when initializedCollator is false?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:373
&gt; +        RELEASE_ASSERT(U_SUCCESS(status));</span >

Why RELEASE_ASSERT here? Why is a single RELEASE_ASSERT sufficient?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:375
&gt; +    return ucol_strcoll(m_collator, x.upconvertedCharacters(), x.length(), y.upconvertedCharacters(), y.length());</span >

If the strings are entirely ASCII, we could use ucol_strcollUTF8. But since it’s hard to efficiently check if a string is UTF-8, the right thing to do to optimize is probably to use ucol_strcollIter, and write a UCharIterator that understands how to iterate a StringView.

An example of how to do this with ucol.h can be seen in the CollatorICU.cpp file. The Collator::collate function does all of this.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:378
&gt; +const char* IntlCollator::getUsageString(Usage usage)</span >

WebKit coding style does not use “get” in the names of functions like this one.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:390
&gt; +const char* IntlCollator::getSensitivityString(Sensitivity sensitivity)</span >

WebKit coding style does not use “get” in the names of functions like this one.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:406
&gt; +JSObject* IntlCollator::resolvedOptions(ExecState&amp; exec)</span >

In new code we should call this argument “state” rather than “exec”.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:418
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;locale, jsString(&amp;exec, m_locale));</span >

Will this do the right thing when m_locale is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:419
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;usage, jsString(&amp;exec, ASCIILiteral(getUsageString(m_usage))));</span >

The value of m_usage could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:420
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;sensitivity, jsString(&amp;exec, ASCIILiteral(getSensitivityString(m_sensitivity))));</span >

The value of m_sensitivity could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:421
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;ignorePunctuation, jsBoolean(m_ignorePunctuation));</span >

The value of m_ignorePunctuation could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:422
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;collation, jsString(&amp;exec, m_collation));</span >

Will this do the right thing when m_collation is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:423
&gt; +    options-&gt;putDirect(vm, vm.propertyNames-&gt;numeric, jsBoolean(m_numeric));</span >

The value of m_numeric could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollator.cpp:429
&gt; +    m_boundCompare.set(vm, this, format);</span >

Do we have test cases covering what happens when we pass in an object of the wrong type?

<span class="quote">&gt; Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:86
&gt; +static EncodedJSValue JSC_HOST_CALL IntlCollatorFuncCompare(ExecState* exec)</span >

In new code this argument should be named “state” rather than “exec”.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>