[webkit-changes] [WebKit/WebKit] d2928a: Improve & simplify the HTML entity parsing in the ...

Chris Dumez noreply at github.com
Thu Apr 13 16:17:22 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: d2928a3c32a3799b66bbdeb863e5f8bbb3c5972b
      https://github.com/WebKit/WebKit/commit/d2928a3c32a3799b66bbdeb863e5f8bbb3c5972b
  Author: Chris Dumez <cdumez at apple.com>
  Date:   2023-04-13 (Thu, 13 Apr 2023)

  Changed paths:
    M Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp
    M Source/WebCore/platform/text/SegmentedString.cpp
    M Source/WebCore/platform/text/SegmentedString.h
    M Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
    M Tools/TestWebKitAPI/Tests/WebCore/HTMLParserIdioms.cpp

  Log Message:
  -----------
  Improve & simplify the HTML entity parsing in the HTML fast parser
https://bugs.webkit.org/show_bug.cgi?id=255366
<rdar://problem/107961494>

Reviewed by Ryosuke Niwa.

Improve & simplify the HTML entity parsing in the HTML fast parser:

Drop a lot of logic in HTMLFastPathParser::scanHTMLCharacterReference()
to try and parse simple HTML entities and instead rely on
solely on consumeHTMLEntity().

This simplifies the code a lot and actually allows the HTML fast parser
to support more input. For example, previously, the fast parser would
fail parsing for "food & water" because it would find an '&' character
and fail to parse an HTML entity. Also, it would fail to parse some
complex cases where the HTML entity doesn't end with a semicolon
(e.g. "&nbsp&a"). The fast parser now essentially has the same behavior
as the full HTML parser when it comes to HTML entities, since it relies
on the same consumeHTMLEntity() function and uses it in the same way.

Note that extended support is covered by the extended API test. Also
note that we have a pre-existing debug assertion in place to make sure
the fast parser returns the exact same output as the full parser. We
thus have good test coverage for correctness.

For performance reasons and because the HTMLEntityParser currently only
works with SegmentedStrings, I added support for constructing a
SegmentedString from a StringView. This avoids unnecessary String
constructions just for entity parsing.

This is performance neutral on Speedometer but it makes the code simpler
and allows our HTML fast parser to deal with more complex input.

* Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp:
(WebCore::HTMLFastPathParser::scanHTMLCharacterReference):
* Source/WebCore/platform/text/SegmentedString.cpp:
(WebCore::SegmentedString::Substring::appendTo const):
* Source/WebCore/platform/text/SegmentedString.h:
(WebCore::SegmentedString::Substring::Substring):
(WebCore::SegmentedString::Substring::numberOfCharactersConsumed const):
(WebCore::SegmentedString::SegmentedString):
* Source/WebCore/xml/parser/CharacterReferenceParserInlines.h:
(WebCore::unconsumeCharacters):
* Tools/TestWebKitAPI/Tests/WebCore/HTMLParserIdioms.cpp:
(TestWebKitAPI::TEST):

Canonical link: https://commits.webkit.org/262934@main




More information about the webkit-changes mailing list