[webkit-reviews] review denied: [Bug 119423] Improve srcset parser : [Attachment 209108] Patch

Tue Aug 20 01:23:04 PDT 2013

Benjamin Poulain <benjamin at webkit.org> has denied Romain Perier
<romain.perier at gmail.com>'s request for review:
Bug 119423: Improve srcset parser
https://bugs.webkit.org/show_bug.cgi?id=119423

Attachment 209108: Patch
https://bugs.webkit.org/attachment.cgi?id=209108&action=review

------- Additional Comments from Benjamin Poulain <benjamin at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=209108&action=review

I am going to bed, I'll finish the review on the next iteration :)

General comments:
-This looks very efficient allocation wise. (did you benchmark it?)
-You should have more variables with better names instead of reusing index and
from.
-If you can avoid loading a character twice from memory, that is great.
-We should probably extract the whole parsing code in a static function with a
explanative name (parseImagesWithScaleFromSrcsetAttribute()?). That's usually
how we document things in WebKit.

> Source/WebCore/ChangeLog:14
> +	   when parsing. That is more efficient on ARM devices and avoids
trashing the data cache.

You can probably drop the "on ARM". I have the feeling this is gonna be faster
everywhere.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:325
> +    size_t from = 0;

"from" is a little generic.
What about imageCandidateStringStart?

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:328
>  

Extra space.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329
> +	   if (from >= srcSetAttribute.length())

You could keep srcSetAttribute.length() in a temporary variable outside the
loop. This would avoid querying for the string impl for each iterations.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:330
> +	       break;

Since this is the only break condition for the loop, why not have while(from >=
srcSetAttributeLength)?

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:335
> +	   // Collect a sequence of characters that are not space, it's the url

Missing period.

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:340
> +	   size_t index = srcSetAttribute.find(isNotHTMLSpace, from);
> +	   if (index >= separator) {
> +	       from = separator + 1;
>	       continue;
> -	   if (!data.last().endsWith('x'))
> +	   }

I would start with this, then find the separator.

The reason has to do with how you read the memory:
Let's say you have "	foo.jpg 2x". When executing "srcSetAttribute.find(',',
from)", you will go over the spaces and find the separator. Then, you execute
"srcSetAttribute.find(isNotHTMLSpace, from)" which will read the memory again
from the beginning to find the character 'f'.

If you inverses the operations. You first skip all the HTMLSpace. Then you
never have to go over them again. You take the position you found, and you look
for the separator: "srcSetAttribute.find(',', index).

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:341
> +	   from = index;

index is too generic. The previous "index" is "startOfTheURLString"

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:342
> +	   index = srcSetAttribute.find(isHTMLSpace, index);

index -> endOfTheURLString?

You could also imagine doing this before finding the separator (to avoid
loading the same characters again from memory).

In that case, "foo,bar.jpg 2x, would parse "foo,bar.jpg" as the URL. I don't
know if that is okay? (hum, do we have tests for that kind of input anyway?)

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:347
> +	   size_t url = from, spaceSep = index;

Each statement should get its own line. (WebKit coding style)

> Source/WebCore/html/parser/HTMLParserIdioms.cpp:349
> +	   // Collect a sequence of characters that are neither "," nor space,
it's the descriptor

Missing period.