[webkit-reviews] review requested: [Bug 95207] CSS Parser should directly parse 8 bit source strings : [Attachment 161115] Patch with suggested changes and speculative cr-linux test fix

Tue Aug 28 18:54:52 PDT 2012

Michael Saboff <msaboff at apple.com> has asked  for review:
Bug 95207: CSS Parser should directly parse 8 bit source strings
https://bugs.webkit.org/show_bug.cgi?id=95207

Attachment 161115: Patch with suggested changes and speculative cr-linux test
fix
https://bugs.webkit.org/attachment.cgi?id=161115&action=review

------- Additional Comments from Michael Saboff <msaboff at apple.com>
(In reply to comment #7)
> (From update of attachment 161039 [details])
> View in context:
https://bugs.webkit.org/attachment.cgi?id=161039&action=review
> 
> I had a first look, I have early comments about some details.
> Can you please measure how much does the code grow with the patch?

Comparing the size output from before and after X86-64 release builds of Mac
WebCore.framework:
	  __TEXT    __DATA    __OBJC	others	     dec       hex
before- 17391616   1871872	   0  18460672	37724160   23fa000
 after- 17408000   1871872	   0  18399232	37679104   23ef000
 delta-    16384	 0	   0
This growth is less than I would have thought!

> > Source/WebCore/css/CSSParser.cpp:168
> > +template <typename CharacterType>
> > +ALWAYS_INLINE static bool equal(const CharacterType* a, const char* b, int
length)
> >  {
> > -	 for (int i = 0; i < a.length; ++i) {
> > +	 for (int i = 0; i < length; ++i) {
> >	     if (!b[i])
> >		 return false;
> > -	     if (a.characters[i] != b[i])
> > +	     if (a[i] != b[i])
> >		 return false;
> >	 }
> > -	 return !b[a.length];
> > +	 return !b[length];
> >  }
> 
> We have optimized versions of this in StringImpl.h, you should call them
directly.

Done.

> > Source/WebCore/css/CSSParser.cpp:186
> > +template <typename CharacterType>
> > +ALWAYS_INLINE static bool equalIgnoringCase(const CharacterType* a, const
char* b, int length)
> >  {
> > -	 for (int i = 0; i < a.length; ++i) {
> > +	 for (int i = 0; i < length; ++i) {
> >	     if (!b[i])
> >		 return false;
> >	     ASSERT(!isASCIIUpper(b[i]));
> > -	     if (toASCIILower(a.characters[i]) != b[i])
> > +	     if (toASCIILower(a[i]) != b[i])
> >		 return false;
> >	 }
> > -	 return !b[a.length];
> > +	 return !b[length];
> > +}
> 
> We have optimized versions of this in StringImpl.h, you should call them
directly.

Done.

> > Source/WebCore/css/CSSParser.cpp:335
> > +	 int stringLength = string.length();
> > +	 int length = stringLength + m_parsedTextPrefixLength + strlen(suffix)
+ 1;
> 
> String length are typically unsigned.

Fixed this.

> > Source/WebCore/css/CSSParser.cpp:340
> > +	     for (unsigned i = 0; i < m_parsedTextPrefixLength; i++)
> > +		 m_dataStart8[i] = prefix[i];
> 
> memcpy?

The prefixes and suffices are so small, 0 to a 33 bytes with most of them 0
that I think the call overhead would outweigh the benefit. 

> > Source/WebCore/css/CSSParser.cpp:348
> > +	     for (unsigned i = start; i < end; i++)
> > +		 m_dataStart8[i] = suffix[i - start];
> 
> memcpy?
> Or just a while loop incrementing the pointers. I think the "i - start" is a
little less readable.

Clang on X86 makes slightly better code using scaled index instructions instead
of needing to increment a pointer (an ALU operation).  I assume that the same
is true for gcc.  I can clean up the suffix index calculation, but it will
probably look best with another local.

> > Source/WebCore/css/CSSParser.cpp:350
> > +	     m_dataStart8[length - 1] = 0;
> 
> If you do:
> unsigned length = stringLength + m_parsedTextPrefixLength + strlen(suffix) +
1;
> ...
> m_dataStart8 = adoptArrayPtr(new LChar[length + 1]);
> ...
> This becomes:
> m_dataStart8[length] = 0;
> 
> I personally prefer this way of handling zero termination, but that is up to
you.

I want m_length to be the buffer length.  It is used to allocate a 16 bit
buffer when there is 8 bit source with a 16 bit escape.

> > Source/WebCore/css/CSSParser.cpp:3122
> > -	 AtomicString variableName = String(name.characters + 12, name.length -
12);
> > +	 AtomicString variableName = name.is8Bit() ? String(name.characters8()
+ 12, name.length() - 12) : String(name.characters16() + 12, name.length() -
12);
> 
> Creating an AtomicString from a String can be inefficient. You should use the
AtomicString constructor here instead.

Done.

> > Source/WebCore/css/CSSParser.cpp:8871
> > +		     UnicodeToChars(result, unicode);		 }
> 
> Lost bracket :)

Saw and fixed after original post.

> > Source/WebCore/css/CSSParser.h:489
> > +	 LChar* m_currentCharacter8;
> > +	 UChar* m_currentCharacter16;
> 
> Should this be an union? Are they mutually exclusive?

These are both needed.	For the case that we have an 8 bit source and a Unicode
escape that resolves to 16 bit, we use the 16 bit data buffer and
m_currentCharacter16 to track where in that buffer.