[webkit-dev] Slow idioms with WTF::String

Darin Adler darin at apple.com
Tue Jul 12 10:25:17 PDT 2011


Hi folks.

The key to fast use of WTF::String is to avoid creating temporary WTF::StringImpl objects or temporary copies of string data.

With the latest enhancements to WTF::String, here are the preferred fast ways to build a new string:

    - A single expression with the + operator and arguments of type WTF::String, char, UChar, const char*, const UChar*, Vector<char>, and WTF::AtomicString.
    - A call to the WTF::makeString function.
    - An expression that uses a single function on the string, or uses the + operator exactly once, or the += operator with the types it supports directly.
    - WTF::StringBuilder, in cases where the logic to compute the pieces of the string has complex branching logic or requires a loop.

Here are acceptable, but not preferred ways to build a new string:

    - Building up a Vector<UChar> followed by WTF::String::adopt. I believe StringBuilder is always better, so we should probably retire this idiom.

Inefficient ways to build a new string include any uses of more than one of the following:

    - WTF::String::append.
    - The += operator.

There are other operations that modify the WTF::String; none of those are efficient if the string in question is then modified further.

    - WTF::String::insert.
    - WTF::String::replace.

In addition, there are quite a few operations that return a WTF::String, and none of those are efficient if the string in question is then modified further.

    - WTF::String::number.
    - WTF::String::substring.
    - WTF::String::left.
    - WTF::String::right.
    - WTF::String::lower.
    - WTF::String::upper.
    - WTF::String::stripWhiteSpace.
    - WTF::String::simplifyWhiteSpace.
    - WTF::String::removeCharacters.
    - WTF::String::foldCase.
    - WTF::String::format.
    - WTF::String::fromUTF8.

One reason I bring this up is that if we wanted to make combinations of these more efficient, we might be able to use techniques similar to those used in StringOperators.h to make it so the entire result string is built at one time, eliminating unnecessary copies of the string characters and intermediate StringImpl objects on the heap.

It would be interesting to find out how often the inefficient idioms are used. Until recently, there was no significantly better alternative to the inefficient idioms, so it’s highly likely we have them in multiple places.

A quick grep showed me inefficient uses of += in XMLDocumentParser::handleError and XPath::FunTranslate::evaluate, parseRFC822HeaderFields, InspectorStyleSheet::addRule, drawElementTitle in DOMNodeHighlighter.cpp, WebKitCSSTransformValue::cssText, CSSSelector::selectorText, CSSPrimitiveValue::cssText, CSSBorderImageValue::cssText, and CSSParser::createKeyframeRule.

I would not be surprised if at least some of these will show up immediately with the right kind of performance test. The CSS parsing and serialization functions seem almost certain to be measurably slow.

I’m looking for two related things:

    1) A clean way to find and root out uses of the inefficient idioms that we can work on together as a team.

     2) Some ways to further refine WTF::String so it’s harder to “use it wrong”. I don’t have any immediate steps in mind, but one possibility would be to remove functions that are usually part of poorly-performing idioms, pushing WebKit programmers subtly in the direction of operations that don’t build intermediate strings.

    -- Darin



More information about the webkit-dev mailing list