[Webkit-unassigned] [Bug 147604] [INTL] Implement Collator Compare Functions

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Nov 17 08:49:34 PST 2015


https://bugs.webkit.org/show_bug.cgi?id=147604

Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #265024|review?                     |review+
              Flags|                            |

--- Comment #16 from Darin Adler <darin at apple.com> ---
Comment on attachment 265024
  --> https://bugs.webkit.org/attachment.cgi?id=265024
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=265024&action=review

Looks good.

We need some performance tests here. This code is highly likely to be super-slow because the memory allocation; much of it is unnecessary. Performance tests will help us make the right decisions about further optimization.

Code looks generally good but we need a lot more test coverage. I see lots of what might be minor mistakes in the code and the tests could show us if these are OK or if these are actually exposed to the JavaScript code.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:91
> +static Vector<String> sortLocaleData(const String& locale, const String& key)

I don’t like the interface to this function. If “key” can only be “co” or “kn”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

> Source/JavaScriptCore/runtime/IntlCollator.cpp:120
> +                String collation(keywordValue);
> +
> +                // 10.2.3 "The values "standard" and "search" must not be used as elements in any [[sortLocaleData]][locale].co and [[searchLocaleData]][locale].co array."
> +                if (collation == "standard" || collation == "search")
> +                    continue;
> +
> +                // Map keyword values to BCP 47 equivalents.
> +                if (collation == "dictionary")
> +                    collation = ASCIILiteral("dict");
> +                else if (collation == "gb2312han")
> +                    collation = ASCIILiteral("gb2312");
> +                else if (collation == "phonebook")
> +                    collation = ASCIILiteral("phonebk");
> +                else if (collation == "traditional")
> +                    collation = ASCIILiteral("trad");
> +
> +                keyLocaleData.append(collation);

Would be nice to optimize this to only create a String after determining that it’s what we want to append. We can use strcmp on keywordValue instead of converting to a String and using ==.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:126
> +        keyLocaleData.append(ASCIILiteral("false"));
> +        keyLocaleData.append(ASCIILiteral("true"));

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:132
> +static Vector<String> searchLocaleData(const String&, const String& key)

I don’t like the interface to this function. If “key” can only be “co”, “kn”, or “sensitivity”, then why are we passing a string, and not something more like an enum? It’s not good to have a function that takes a string, but only accepts a limited set of values, and requires the caller to do that checking (since it does ASSERT_NOT_REACHED).

> Source/JavaScriptCore/runtime/IntlCollator.cpp:138
> +        keyLocaleData.append(String());

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:141
> +        keyLocaleData.append(ASCIILiteral("false"));
> +        keyLocaleData.append(ASCIILiteral("true"));

Should use reserveInitialCapacity and uncheckedAppend.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:144
> +        keyLocaleData.append("variant");

Should use reserveInitialCapacity and uncheckedAppend.

Missing ASCIILiteral here.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> +void IntlCollator::initializeCollator(ExecState& exec, JSValue locales, JSValue optionsValue)

In new code we should name the argument “state” rather than “exec”.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:155
> +    m_initializedCollator = true;

Why set this here instead of at the end of the function? The pseudo code only sets it once we are past all the possible exceptions. Do we have test cases covering this?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:157
> +    // 1. If collator has an [[initializedIntlObject]] internal slot with value true, throw a TypeError exception.

Where is that step? Seems missing.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:180
> +    const HashSet<String> usages({ ASCIILiteral("sort"), ASCIILiteral("search") });

Does this really need to be a HashSet? It’s quite inefficient to create and destroy a HashSet every time through this function. And if the set has a small number of entries, HashSet is not as efficient as, say, a sorted list with binary search, or even a linear search through an array. We can write the function so it takes a std::array or a std::initializer_list and make it much more efficient.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:181
> +    String usageString = getIntlStringOption(&exec, options, exec.vm().propertyNames->usage, usages, "usage must be either \"sort\" or \"search\"", ASCIILiteral("sort"));

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:207
> +    const HashSet<String> matchers({ ASCIILiteral("lookup"), ASCIILiteral("best fit") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:237
> +        const HashSet<String> caseFirsts({ ASCIILiteral("upper"), ASCIILiteral("lower"), ASCIILiteral("false") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:285
> +    const HashSet<String> sensitivities({ ASCIILiteral("base"), ASCIILiteral("accent"), ASCIILiteral("case"), ASCIILiteral("variant") });

Same comment about the use of HashSet.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:337
> +        m_collator = ucol_open(m_locale.utf8().data(), &status);

Is it safe to do this if m_initializedCollator is false? We need test cases that cover this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:338
> +        RELEASE_ASSERT(U_SUCCESS(status));

Why RELEASE_ASSERT here? Seems overkill, but maybe I am missing something.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:342
> +        switch (m_sensitivity) {

The value of m_sensitivity could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:362
> +        UColAttributeValue numericCollation = m_numeric ? UCOL_ON : UCOL_OFF;

The value of m_numeric could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:366
> +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> +        // ignored. There is currently no way to ignore only punctuation.

We need to construct test cases showing this problem, rather than just a FIXME.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:367
> +        UColAttributeValue alternateHandling = m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT;

The value of m_ignorePunctuation could be uninitialized here. What is this function supposed to do when initializedCollator is false?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:373
> +        RELEASE_ASSERT(U_SUCCESS(status));

Why RELEASE_ASSERT here? Why is a single RELEASE_ASSERT sufficient?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:375
> +    return ucol_strcoll(m_collator, x.upconvertedCharacters(), x.length(), y.upconvertedCharacters(), y.length());

If the strings are entirely ASCII, we could use ucol_strcollUTF8. But since it’s hard to efficiently check if a string is UTF-8, the right thing to do to optimize is probably to use ucol_strcollIter, and write a UCharIterator that understands how to iterate a StringView.

An example of how to do this with ucol.h can be seen in the CollatorICU.cpp file. The Collator::collate function does all of this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:378
> +const char* IntlCollator::getUsageString(Usage usage)

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:390
> +const char* IntlCollator::getSensitivityString(Sensitivity sensitivity)

WebKit coding style does not use “get” in the names of functions like this one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:406
> +JSObject* IntlCollator::resolvedOptions(ExecState& exec)

In new code we should call this argument “state” rather than “exec”.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:418
> +    options->putDirect(vm, vm.propertyNames->locale, jsString(&exec, m_locale));

Will this do the right thing when m_locale is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:419
> +    options->putDirect(vm, vm.propertyNames->usage, jsString(&exec, ASCIILiteral(getUsageString(m_usage))));

The value of m_usage could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:420
> +    options->putDirect(vm, vm.propertyNames->sensitivity, jsString(&exec, ASCIILiteral(getSensitivityString(m_sensitivity))));

The value of m_sensitivity could be uninitialized here, so we could actually hit the ASSERT_NOT_REACHED. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:421
> +    options->putDirect(vm, vm.propertyNames->ignorePunctuation, jsBoolean(m_ignorePunctuation));

The value of m_ignorePunctuation could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:422
> +    options->putDirect(vm, vm.propertyNames->collation, jsString(&exec, m_collation));

Will this do the right thing when m_collation is null? We need test cases that cover this, including all the early exit cases that can happen in initializeCollator.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:423
> +    options->putDirect(vm, vm.propertyNames->numeric, jsBoolean(m_numeric));

The value of m_numeric could be uninitialized here. We need test cases that cover this, including all the early exit cases that can happen in initializeCollator, and I suggest initializing this data member in the class definition in the header.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:429
> +    m_boundCompare.set(vm, this, format);

Do we have test cases covering what happens when we pass in an object of the wrong type?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:86
> +static EncodedJSValue JSC_HOST_CALL IntlCollatorFuncCompare(ExecState* exec)

In new code this argument should be named “state” rather than “exec”.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20151117/a6525dfc/attachment-0001.html>


More information about the webkit-unassigned mailing list