[webkit-changes] [WebKit/WebKit] 629838: Parsing HTML entities shouldn't call malloc

Darin Adler noreply at github.com
Tue May 30 08:34:39 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 6298382b1e4a2c3f7b610add1612872b96904814
      https://github.com/WebKit/WebKit/commit/6298382b1e4a2c3f7b610add1612872b96904814
  Author: Darin Adler <darin at apple.com>
  Date:   2023-05-30 (Tue, 30 May 2023)

  Changed paths:
    M Source/WebCore/WebCore.xcodeproj/project.pbxproj
    M Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp
    M Source/WebCore/html/parser/HTMLEntityParser.cpp
    M Source/WebCore/html/parser/HTMLEntityParser.h
    M Source/WebCore/html/parser/HTMLTokenizer.cpp
    R Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
    M Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp

  Log Message:
  -----------
  Parsing HTML entities shouldn't call malloc
https://bugs.webkit.org/show_bug.cgi?id=119921
rdar://109976279

Reviewed by Chris Dumez.

This was inspired by some work done on Chromium. While I didn't use
the Chromium patch, I did much the same work, taking advantage of the
fact that HTML entity parsing only generates a sequence of 1-3 UTF-16
code points, not arbitrary strings. Also fixed mismatch between the
interface and the needs in the fast path HTML parser.

* Source/WebCore/WebCore.xcodeproj/project.pbxproj: Removed
CharacterReferenceParserInlines.h.

* Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp:
(WebCore::HTMLFastPathParser::scanHTMLCharacterReference): Use consumeHTMLEntity
function that takes a StringParsingBuffer. This elimintes the need for a
temporary SegmentedString, and resolves the FIXME that was here.

* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::DecodedHTMLEntity::DecodedHTMLEntity): Added constructors for the
class used for the return type.
(WebCore::makeEntity): Added. Converts a UChar32, Checked<UChar32>, or
HTMLEntityTableEntry into a DecodedHTMLEntity.
(WebCore::SegmentedStringSource): Added. Adapter for SegmentedString so we can
share a single set of parser functions.
(WebCore::StringParsingBufferSource): Added. Adapter for StringParsingBuffer.
(WebCore::consumeDecimalHTMLEntity): Added. Refactored from code formerly
in CharacterReferenceParserInlines.h.
(WebCore::consumeHexHTMLEntity): Ditto.
(WebCore::consumeNamedHTMLEntity): Added. Refactored from code formerly
in HTMLEntityParser::consumeNamedEntity.
(WebCore::consumeHTMLEntity): Added. Refactored from code formerly
in CharacterReferenceParserInlines.h.
(WebCore::decodeNamedHTMLEntityForXMLParser): Renamed from
decodeNamedEntityToUCharArray. We now take a std::array& for safety so it's
no longer necessary to put the data type in the function name.

* Source/WebCore/html/parser/HTMLEntityParser.h: Updated includes.
Added a new DecodedHTMLEntity type for the return value from the parser.
Got rid of out parameters and put the error cases in the return value.
Another alternative would have been std::expected.

* Source/WebCore/html/parser/HTMLTokenizer.cpp:
(WebCore::HTMLTokenizer::processEntity): Updated for changes to consumeHTMLEntity.
(WebCore::HTMLTokenizer::processToken): Ditto.

* Source/WebCore/xml/parser/CharacterReferenceParserInlines.h: Removed.

* Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:
(WebCore::convertUTF16EntityToUTF8): Updated to use std::span.
(WebCore::getXHTMLEntity): Updated for decodeNamedHTMLEntityForXMLParser.

Canonical link: https://commits.webkit.org/264675@main




More information about the webkit-changes mailing list