[webkit-reviews] review granted: [Bug 44831] The logic to escape entities in appendEscapedContent and appendAttributeValue should be merged : [Attachment 65841] cleanup

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Aug 28 15:34:47 PDT 2010


Darin Adler <darin at apple.com> has granted Ryosuke Niwa <rniwa at webkit.org>'s
request for review:
Bug 44831: The logic to escape entities in appendEscapedContent and
appendAttributeValue should be merged
https://bugs.webkit.org/show_bug.cgi?id=44831

Attachment 65841: cleanup
https://bugs.webkit.org/attachment.cgi?id=65841&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
The uses of the word "escape" here don't make that much sense to me. I don't
think we normally talk about entities as "escaped".

> +enum EntityEscapeFlag {
> +    EscapeNone  = 0x0000,
> +    EscapeAmp   = 0x0001,
> +    EscapeLt    = 0x0002,
> +    EscapeGt    = 0x0004,
> +    EscapeQuot  = 0x0008,
> +    EscapeNbsp  = 0x0010,
> +};

We normally avoid lining things up like this because if you change names it
ends up unaligned later.

> +static void escapeEntities(Vector<UChar>& out, const UChar* content, size_t
length, int entitiesToEscape)

The variable name "entitiesToEscape" here should have the word "mask" or
something like that in it, because otherwise it seems it could be a count.

I think that escapeEntities is not a great name for a function that appends to
a vector while turning certain characters into entity sequences. I would call
this appendCharactersAndEntities or something like that. The key is that it's
an append function, not an escape function, since it appends characters.

> +    size_t positionAfterLastEntity = 0;
> +    for (size_t i = 0; i < length; i++) {
> +	   // This loop should be unfolded by compiler

What does that comment mean? Do you mean the compiler should unroll the loop?
Do our compilers unroll the loop or not? I don't understand the purpose of the
comment.

> +	   // HashMap is an overkill here because there are only 4 entities.

This seems a bit too specific a comment. I would just leave it out.

> +	   for (size_t m = 0; m < sizeof(entityMaps) / sizeof(EntityMap); m++)
{

I don't think EntityMap is a good name for a single entry in this array. I
would call that single one an EntityDescription or something. The overall
structure is what I'd call a map, not a single entry.

> +		   out.append('&');
> +		   append(out, entityMaps[m].name);
> +		   out.append(';');

I suggest just including the "&" and the ";" in the string inside the map. The
cost would be low and we'd end up with easier to understand code and faster
logic since we would not have three separate append calls:

> +    escapeEntities(result, attribute.characters(), attribute.length(),
> +		      EscapeAmp | EscapeLt | EscapeGt | EscapeQuot |
(escapeNBSP ? EscapeNbsp : 0));

Normally we don't line up second lines like this. Instead we just indent four
spaces. If the escapeEntities function was renamed the alignment would be
screwed up. We don't like that kind of high-maintenance formatting.

> +	   appendNodeValue(out, text, m_range, EscapeAmp | EscapeLt | EscapeGt
> +			   | (text->document()->isHTMLDocument() ? EscapeNbsp :
0));

Same comment about lining things up.

> +    String content = useRenderedText ? renderedText(text, m_range) :
stringValueForRange(text, m_range);
> +    Vector<UChar> buffer;
> +    escapeEntities(buffer, content.characters(), content.length(), EscapeAmp
| EscapeLt | EscapeGt);
> +    String markup =
convertHTMLTextToInterchangeFormat(String::adopt(buffer), text);
>      append(out, markup);

I think this would read better if you eliminated the local variable named
markup.

We might want constants for the combinations of entities. Those could help
document why they are the right set of characters to encode. For example, the
minimal set for text outside of an tag is &amp; and &lt; but then we include
&gt; just because we think it's inelegant to escape one and not the other. Then
we escape quote marks to make things safe for inside an attribute value. Not
sure why we escape non-breaking spaces at all.

Using constants for the masks can help document all of this.

I'm going to say r=me on this patch but I think you may want to do quite a bit
more refinement on this before landing it, so you may want to post a new patch.


More information about the webkit-reviews mailing list