[Webkit-unassigned] [Bug 147604] [INTL] Implement Collator Compare Functions
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Tue Dec 15 08:45:34 PST 2015
https://bugs.webkit.org/show_bug.cgi?id=147604
Darin Adler <darin at apple.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #267366|review? |review-
Flags| |
--- Comment #19 from Darin Adler <darin at apple.com> ---
Comment on attachment 267366
--> https://bugs.webkit.org/attachment.cgi?id=267366
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=267366&action=review
Looks like great work. Many good improvements, not just the new feature.
review- because the string object lifetime mistake in IntlCollatorFuncCompare is a serious one.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:104
> + keyLocaleData.append(String());
Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:125
> + keyLocaleData.append(String(collation));
This treats the keyword value strings as ASCII or Latin-1, not UTF-8. Is that correct? Also, an explicit conversion to String() is not needed; please just write append(collation).
To create the strings as UTF-8, we would use String::fromUTF8(collation) but we also have to consider that the string might be invalid UTF-8 which would result in a null string.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:150
> + keyLocaleData.append(String());
Alternate syntax for null in this context is { } rather than String(). I slightly prefer the former.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:170
> + Vector<String> requestedLocales = canonicalizeLocaleList(state, locales);
I like to use auto for the return value type in a case like this.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:220
> + opt.set(ASCIILiteral("localeMatcher"), matcher);
More efficient to use the add function here rather than the set function.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:241
> + opt.set(ASCIILiteral("kn"), numericString);
Ditto.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:247
> + opt.set(ASCIILiteral("kf"), caseFirst);
Ditto.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:252
> + const HashSet<String>& availableLocales = state.callee()->globalObject()->intlCollatorAvailableLocales();
I like auto& here better for the return type.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:253
> + HashMap<String, String> result = resolveLocale(availableLocales, requestedLocales, opt, relevantExtensionKeys, sizeof(relevantExtensionKeys) / sizeof(relevantExtensionKeys[0]), localeData);
I like auto here better for the return type.
WTF_ARRAY_LENGTH is preferred over that sizeof expression. Or perhaps we can use std::array and use relevantExtensionKeys.size().
> Source/JavaScriptCore/runtime/IntlCollator.cpp:329
> + ASSERT(!state.hadException());
Why can we assert this? What guarantees it is true. If we can assert here, why canât we assert along the way doing each step inside initializeCollator itself?
> Source/JavaScriptCore/runtime/IntlCollator.cpp:332
> + if (!m_collator) {
I suggest putting the code to allocate a new collator into a separate function. So the code here would be:
if (!m_collator) {
createCollator();
if (!m_collator)
return state.vm().throwException(&state, createError(&state, ASCIILiteral("error message here")));
}
The code to call initializeCollator could also possibly go into the createCollator function for the same reason.
I think this would make this function easier to read and possible even more efficient.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:379
> + UColAttributeValue strength;
> + UColAttributeValue caseLevel = UCOL_OFF;
> + switch (m_sensitivity) {
> + case Sensitivity::Base:
> + strength = UCOL_PRIMARY;
> + break;
> + case Sensitivity::Accent:
> + strength = UCOL_SECONDARY;
> + break;
> + case Sensitivity::Case:
> + strength = UCOL_PRIMARY;
> + caseLevel = UCOL_ON;
> + break;
> + case Sensitivity::Variant:
> + strength = UCOL_TERTIARY;
> + break;
> + default:
> + ASSERT_NOT_REACHED();
> + }
> + ucol_setAttribute(collator, UCOL_STRENGTH, strength, &status);
> + if (U_FAILURE(status))
> + goto fail;
> + ucol_setAttribute(collator, UCOL_CASE_LEVEL, caseLevel, &status);
> + if (U_FAILURE(status))
> + goto fail;
> +
> + ucol_setAttribute(collator, UCOL_NUMERIC_COLLATION, m_numeric ? UCOL_ON : UCOL_OFF, &status);
> + if (U_FAILURE(status))
> + goto fail;
> +
> + // FIXME: Setting UCOL_ALTERNATE_HANDLING to UCOL_SHIFTED causes punctuation and whitespace to be
> + // ignored. There is currently no way to ignore only punctuation.
> + ucol_setAttribute(collator, UCOL_ALTERNATE_HANDLING, m_ignorePunctuation ? UCOL_SHIFTED : UCOL_DEFAULT, &status);
> + if (U_FAILURE(status))
> + goto fail;
> +
> + // "The method is required to return 0 when comparing Strings that are considered canonically
> + // equivalent by the Unicode standard."
> + ucol_setAttribute(collator, UCOL_NORMALIZATION_MODE, UCOL_ON, &status);
> + if (U_FAILURE(status))
> + goto fail;
> +
Is there any other place we can share this code with? Itâs pretty long winded and I seem to have seen code like this before.
> Source/JavaScriptCore/runtime/IntlCollator.cpp:384
> + return state.vm().throwException(&state, createError(&state, ASCIILiteral("ucol_setAttribute failed.")));
This does not seem like an acceptable string for an error seen by the web. When would this exception occur? Why is it useful to give the function name ucol_setAttribute to the web developer?
> Source/JavaScriptCore/runtime/IntlCollator.h:50
> + JSValue compareStrings(ExecState&, StringView x, StringView y);
I donât think the names âxâ and âyâ here add anything. Please omit them.
> Source/JavaScriptCore/runtime/IntlCollator.h:67
> + enum class Usage {
> + Sort,
> + Search
> + };
I would find this more readable as a one-liner.
> Source/JavaScriptCore/runtime/IntlCollator.h:74
> + enum class Sensitivity {
> + Base,
> + Accent,
> + Case,
> + Variant
> + };
I would find this more readable as a one-liner.
> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:98
> + StringView x = state->argument(0).toString(state)->view(state);
This wonât work. We are creating a StringView pointing to the result of toString, but then the garbage collector is allowed to destroy that object. Instead the local variable needs to be of type JSString* and the call to view needs to be separate. I suggest calling view in the line that calls compareStrings.
> Source/JavaScriptCore/runtime/IntlCollatorPrototype.cpp:104
> + StringView y = state->argument(1).toString(state)->view(state);
Ditto.
> Source/JavaScriptCore/runtime/IntlObject.cpp:118
> + String locale(uloc_getDefault());
Is the result of uloc_getDefault guaranteed to be ASCII or Latin-1? Or could it be UTF-8? If itâs UTF-8 then we need to write this code differently.
> Source/JavaScriptCore/runtime/IntlObject.cpp:202
> state.vm().throwException(&state, createRangeError(&state, String(notFound)));
This explicit conversion to String should not be needed. Please remove it.
> Source/JavaScriptCore/runtime/IntlObject.cpp:211
> - return fallback;
> + return String(fallback);
This explicit conversion to String should not be needed. Please donât make this change.
> Source/JavaScriptCore/runtime/IntlObject.cpp:759
> - supportedExtensionAddition = "-" + key + '-' + value;
> + supportedExtensionAddition = String("-") + key + '-' + value;
More efficient idiom for this is:
supportedExtensionAddition = makeString('-', key, '-', value);
Avoids extra work to allocate and destroy a single character string for "-".
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20151215/3697f7a8/attachment.html>
More information about the webkit-unassigned
mailing list