[Webkit-unassigned] [Bug 147604] [INTL] Implement Collator Compare Functions

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Oct 31 15:29:45 PDT 2015


https://bugs.webkit.org/show_bug.cgi?id=147604

Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #263825|review?                     |review+
              Flags|                            |

--- Comment #9 from Darin Adler <darin at apple.com> ---
Comment on attachment 263825
  --> https://bugs.webkit.org/attachment.cgi?id=263825
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=263825&action=review

Looks OK. I see a lot of room for refinement and improvement.

> Source/JavaScriptCore/runtime/IntlCollator.h:61
> +    bool initializedCollator() const { return m_initializedCollator; }

The naming here is unfortunate. This sounds like a function that returns a collator that has been initialized.

> Source/JavaScriptCore/runtime/IntlCollator.h:62
> +    void setInitializedCollator(bool initializedCollator) { m_initializedCollator = initializedCollator; }

The naming here is unfortunate. This sounds like a function that takes a collator as an argument, not a function that takes a boolean.

Using public functions for this is taking the standards language too literally. We should keep this boolean private and write a member function that initializes if needed and returns if not.

> Source/JavaScriptCore/runtime/IntlCollatorConstructor.h:63
> +IntlCollator* initializeCollator(ExecState*, IntlCollator*, JSValue locales, JSValue optionsValue);

The interface to this function is strange. Why does the function both take a collator as an argument and return one. If the function needs to indicate failure, I suggest it return a boolean or error code. The ExecState argument should be ExecState&, not ExecState*, and the collator argument should be IntlCollator&, not IntlCollator*.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:90
> +static int compareStrings(IntlCollator* collator, const String& x, const String& y)

This should take an IntlCollator&, not an IntlCollator*.

Also should probably take two StringView arguments, not two const String& arguments.

We’re going to need performance tests for this. Right now this is super-slow and we want a version that has reasonably fast performance.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:94
> +    UCollator* icuCollator = ucol_open(collator->locale().utf8().data(), &status);

For acceptable performance, I think it’s important that we not use ucol_open and ucol_close every time. I tried that in code in WebCore and found that everything was much too slow. I think we probably need to put the UCollator in the IntlCollator, not create a new one every time.

Doing all that work to initialize the collator attributes every time we compare a pair of strings is going to result in very poor performance.

We also need to come up with a fast path so we aren’t converting simple strings with all ASCII characters to 16 bit just to call ICU with them. Again, we’ve run into this in the past. We need performance tests for this.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:96
> +        return codePointCompare(x, y);

It is really OK to silently do the wrong thing if ucol_open fails?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:100
> +    const String& sensitivity = collator->sensitivity();

Is this attribute string itself supposed to be case folding or case sensitive? This is not a good pattern, doing string comparisons to convert the collator’s sensitivity setting into arguments for the ICU collator every time we compare a string.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:129
> +        return codePointCompare(x, y);

It is really OK to silently do the wrong thing if one of the ucol_setAttribute calls fail?

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:134
> +    String x16Bit = x.is8Bit() ? String::make16BitFrom8BitSource(x.characters8(), x.length()) : x;
> +    String y16Bit = y.is8Bit() ? String::make16BitFrom8BitSource(y.characters8(), y.length()) : y;
> +    UCollationResult result = ucol_strcoll(icuCollator, x16Bit.characters16(), x16Bit.length(), y16Bit.characters16(), y16Bit.length());

This is the wrong idiom for making a temporary 16-bit copy of a string. The correct idiom is to use upconvertedCharacters(). It works like this:

   UCollationResult result = ucol_strcoll(icuCollator, StringView(x).upconvertedCharacters(), x.length(), StringView(y).upconvertedCharacters(), y.length());

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:147
> +    switch (result) {
> +    case UCOL_EQUAL:
> +        return 0;
> +    case UCOL_GREATER:
> +        return 1;
> +    case UCOL_LESS:
> +        return -1;
> +    }
> +    ASSERT_NOT_REACHED();
> +    return 0;

It’s OK to try to use some abstraction here and not assume the values, but it’s really not useful. This named constants are just ICU names for these same values. There is no benefit to the switch. This can and should just be:

    return result;

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:157
> +    RELEASE_ASSERT(collator);

I don’t see why we would have this assertion in release code.

> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> +    return String(uloc_getDefault()).replace('_', '-');

We should stop copying and pasting the replace we are doing here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20151031/7a539c29/attachment.html>


More information about the webkit-unassigned mailing list