[Webkit-unassigned] [Bug 67081] Basic enhancements to StringBuilder

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Sep 17 00:48:53 PDT 2011


https://bugs.webkit.org/show_bug.cgi?id=67081





--- Comment #11 from Xianzhu Wang <wangxianzhu at chromium.org>  2011-09-17 00:48:53 PST ---
(In reply to comment #9)
> (In reply to comment #8)
> > After comparing the cost, I chose to use AtomicString(toString()) and AtomicString(toStringPreserveCapacity()) in toAtomicString() and toAtomicStringPreserveCapacity() respectively.
> 
> That is the wrong choice when the string happens to be something that’s already in the AtomicString table. In that case, we should do no memory allocation. Only if the string is not in the table do we want to do the normal toString work.
>
> > If they don't use toString() and toStringPreserveCapacity(), they still need shrinkToFit() [(for toAtomicString() and something like ...]
> 
> That’s wrong. We can and should create a code path that finds the existing AtomicString without modifying the StringBuilder at all.

I think the following could help us evaluate the cost and choose the best solution:

The cases:
1. string in table
1.1 buffer much over-allocated (that is, shrinkToFit() will allocate new space and copy)
1.2 buffer just fit or over-allocation<20% (that is, shrinkToFit() will do nothing.)
1.2.1 the StringBuilder happens to have only appended one AtomicString (that is, m_string is an AtomicString)
2. string not in table
Descriptions of 2.1, 2.2 and 2.2.1 are the same as 1.1, 1.2 and 1.2.1 respectively.

The implementations:
A. AtomicString(toString())
A1. Use m_string if it is not null, otherwise shrinkToFit() and construct AtomicString over m_buffer or substring of m_buffer (that is, the similar thing in reifyString()).
B. AtomicString(toStringPreserveCapacity())
B1. same as A1 except no shrinkToFit()
C. AtomicString(characters(), length())

       A/A1                       B/B1                  C
1.1    unnecessary shrinkToFit()  trivial cost/good     good
1.2    trivial cost/good          trivial cost/good     good
1.2.1  good                       good                  unnecessary hash
2.1    good                       extra memory space    string space can't be shared
2.2    good                       good                  string space can't be shared
2.2.1  good                       good                  unnecessary hash and allocation

*trivial cost: the cost to set the value of m_string to m_buffer or substring of m_buffer.

Please correct me if I missed anything.

(In reply to comment #10)
> (In reply to comment #7)
> > (In reply to comment #6)
> > 2. m_string.impl() is a substring of m_buffer, and the append() doesn't exceed capacity. As append() will only change the content after original m_length, m_string will remain unchanged.
> 
> Right, that’s the problematic case. The second call to builder.toString() will return the same string as the first, a string that does not include the last appended character. Unless I’m missing something. Did you actually test this case?
> 

Sorry, "'m_string' will remain unchanged" should be "the original result of toString() will remain unchanged and m_string will be set to null".

The unit tests have covered the case.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list