[Webkit-unassigned] [Bug 147604] [INTL] Implement Collator Compare Functions

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Dec 15 08:45:34 PST 2015


https://bugs.webkit.org/show_bug.cgi?id=147604

Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #267366|review?                     |review-
              Flags|                            |

--- Comment #19 from Darin Adler <darin at apple.com> ---
Comment on attachment 267366
  --> https://bugs.webkit.org/attachment.cgi?id=267366
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=267366&action=review

Looks like great work. Many good improvements, not just the new feature.

review- because the string object lifetime mistake in IntlCollatorFuncCompare is a serious one.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:104
> +        keyLocaleData.append(String());

Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:125
> +                keyLocaleData.append(String(collation));

This treats the keyword value strings as ASCII or Latin-1, not UTF-8. Is that correct? Also, an explicit conversion to String() is not needed; please just write append(collation).

To create the strings as UTF-8, we would use String::fromUTF8(collation) but we also have to consider that the string might be invalid UTF-8 which would result in a null string.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> +        keyLocaleData.append(String());

Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:170
> +    Vector<String> requestedLocales = canonicalizeLocaleList(state, locales);

I like to use auto for the return value type in a case like this.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:220
> +    opt.set(ASCIILiteral("localeMatcher"), matcher);

More efficient to use the add function here rather than the set function.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:241
> +        opt.set(ASCIILiteral("kn"), numericString);

Ditto.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:247
> +        opt.set(ASCIILiteral("kf"), caseFirst);

Ditto.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:252
> +    const HashSet<String>& availableLocales = state.callee()->globalObject()->intlCollatorAvailableLocales();

I like auto& here better for the return type.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:253
> +    HashMap<String, String> result = resolveLocale(availableLocales, requestedLocales, opt, relevantExtensionKeys, sizeof(relevantExtensionKeys) / sizeof(relevantExtensionKeys[0]), localeData);

I like auto here better for the return type.

WTF_ARRAY_LENGTH is preferred over that sizeof expression. Or perhaps we can use std::array and use relevantExtensionKeys.size().

> Source/JavaScriptCore/runtime/IntlCollator.cpp:329
> +        ASSERT(!state.hadException());

Why can we assert this? What guarantees it is true. If we can assert here, why can’t we assert along the way doing each step inside initializeCollator itself?

> Source/JavaScriptCore/runtime/IntlCollator.cpp:332
> +    if (!m_collator) {

I suggest putting the code to allocate a new collator into a separate function. So the code here would be:

    if (!m_collator) {
        createCollator();
        if (!m_collator)
            return state.vm().throwException(&state, createError(&state, ASCIILiteral("error message here")));
    }

The code to call initializeCollator could also possibly go into the createCollator function for the same reason.

I think this would make this function easier to read and possible even more efficient.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:379
> +        UColAttributeValue strength;
> +        UColAttributeValue caseLevel = UCOL_OFF;
> +        switch (m_sensitivity) {
> +        case Sensitivity::Base:
> +            strength = UCOL_PRIMARY;
> +            break;
> +        case Sensitivity::Accent:
> +            strength = UCOL_SECONDARY;
> +            break;
> +        case Sensitivity::Case:
> +            strength = UCOL_PRIMARY;
> +            caseLevel = UCOL_ON;
> +            break;
> +        case Sensitivity::Variant:
> +            strength = UCOL_TERTIARY;
> +            break;
> +        default:
> +            ASSERT_NOT_REACHED();
> +        }
> +        ucol_setAttribute(collator, UCOL_STRENGTH, strength, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +        ucol_setAttribute(collator, UCOL_CASE_LEVEL, caseLevel, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        ucol_setAttribute(collator, UCOL_NUMERIC_COLLATION, m_numeric ? UCOL_ON : UCOL_OFF, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> +        // ignored. There is currently no way to ignore only punctuation.
> +        ucol_setAttribute(collator, UCOL_ALTERNATE_HANDLING, m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +
> +        // "The method is required to return 0 when comparing Strings that are considered canonically
> +        // equivalent by the Unicode standard."
> +        ucol_setAttribute(collator, UCOL_NORMALIZATION_MODE, UCOL_ON, &status);
> +        if (U_FAILURE(status))
> +            goto fail;
> +

Is there any other place we can share this code with? It’s pretty long winded and I seem to have seen code like this before.

> Source/JavaScriptCore/runtime/IntlCollator.cpp:384
> +        return state.vm().throwException(&state, createError(&state, ASCIILiteral("ucol_setAttribute failed.")));

This does not seem like an acceptable string for an error seen by the web. When would this exception occur? Why is it useful to give the function name ucol_setAttribute to the web developer?

> Source/JavaScriptCore/runtime/IntlCollator.h:50
> +    JSValue compareStrings(ExecState&, StringView x, StringView y);

I don’t think the names “x” and “y” here add anything. Please omit them.

> Source/JavaScriptCore/runtime/IntlCollator.h:67
> +    enum class Usage {
> +        Sort,
> +        Search
> +    };

I would find this more readable as a one-liner.

> Source/JavaScriptCore/runtime/IntlCollator.h:74
> +    enum class Sensitivity {
> +        Base,
> +        Accent,
> +        Case,
> +        Variant
> +    };

I would find this more readable as a one-liner.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:98
> +    StringView x = state->argument(0).toString(state)->view(state);

This won’t work. We are creating a StringView pointing to the result of toString, but then the garbage collector is allowed to destroy that object. Instead the local variable needs to be of type JSString* and the call to view needs to be separate. I suggest calling view in the line that calls compareStrings.

> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:104
> +    StringView y = state->argument(1).toString(state)->view(state);

Ditto.

> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> +    String locale(uloc_getDefault());

Is the result of uloc_getDefault guaranteed to be ASCII or Latin-1? Or could it be UTF-8? If it’s UTF-8 then we need to write this code differently.

> Source/JavaScriptCore/runtime/IntlObject.cpp:202
>              state.vm().throwException(&state, createRangeError(&state, String(notFound)));

This explicit conversion to String should not be needed. Please remove it.

> Source/JavaScriptCore/runtime/IntlObject.cpp:211
> -    return fallback;
> +    return String(fallback);

This explicit conversion to String should not be needed. Please don’t make this change.

> Source/JavaScriptCore/runtime/IntlObject.cpp:759
> -                        supportedExtensionAddition = "-" + key + '-' + value;
> +                        supportedExtensionAddition = String("-") + key + '-' + value;

More efficient idiom for this is:

    supportedExtensionAddition = makeString('-', key, '-', value);

Avoids extra work to allocate and destroy a single character string for "-".

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20151215/3697f7a8/attachment.html>


More information about the webkit-unassigned mailing list