[webkit-changes] [WebKit/WebKit] 629838: Parsing HTML entities shouldn't call malloc
Darin Adler
noreply at github.com
Tue May 30 08:34:39 PDT 2023
Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 6298382b1e4a2c3f7b610add1612872b96904814
https://github.com/WebKit/WebKit/commit/6298382b1e4a2c3f7b610add1612872b96904814
Author: Darin Adler <darin at apple.com>
Date: 2023-05-30 (Tue, 30 May 2023)
Changed paths:
M Source/WebCore/WebCore.xcodeproj/project.pbxproj
M Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp
M Source/WebCore/html/parser/HTMLEntityParser.cpp
M Source/WebCore/html/parser/HTMLEntityParser.h
M Source/WebCore/html/parser/HTMLTokenizer.cpp
R Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
M Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp
Log Message:
-----------
Parsing HTML entities shouldn't call malloc
https://bugs.webkit.org/show_bug.cgi?id=119921
rdar://109976279
Reviewed by Chris Dumez.
This was inspired by some work done on Chromium. While I didn't use
the Chromium patch, I did much the same work, taking advantage of the
fact that HTML entity parsing only generates a sequence of 1-3 UTF-16
code points, not arbitrary strings. Also fixed mismatch between the
interface and the needs in the fast path HTML parser.
* Source/WebCore/WebCore.xcodeproj/project.pbxproj: Removed
CharacterReferenceParserInlines.h.
* Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp:
(WebCore::HTMLFastPathParser::scanHTMLCharacterReference): Use consumeHTMLEntity
function that takes a StringParsingBuffer. This elimintes the need for a
temporary SegmentedString, and resolves the FIXME that was here.
* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::DecodedHTMLEntity::DecodedHTMLEntity): Added constructors for the
class used for the return type.
(WebCore::makeEntity): Added. Converts a UChar32, Checked<UChar32>, or
HTMLEntityTableEntry into a DecodedHTMLEntity.
(WebCore::SegmentedStringSource): Added. Adapter for SegmentedString so we can
share a single set of parser functions.
(WebCore::StringParsingBufferSource): Added. Adapter for StringParsingBuffer.
(WebCore::consumeDecimalHTMLEntity): Added. Refactored from code formerly
in CharacterReferenceParserInlines.h.
(WebCore::consumeHexHTMLEntity): Ditto.
(WebCore::consumeNamedHTMLEntity): Added. Refactored from code formerly
in HTMLEntityParser::consumeNamedEntity.
(WebCore::consumeHTMLEntity): Added. Refactored from code formerly
in CharacterReferenceParserInlines.h.
(WebCore::decodeNamedHTMLEntityForXMLParser): Renamed from
decodeNamedEntityToUCharArray. We now take a std::array& for safety so it's
no longer necessary to put the data type in the function name.
* Source/WebCore/html/parser/HTMLEntityParser.h: Updated includes.
Added a new DecodedHTMLEntity type for the return value from the parser.
Got rid of out parameters and put the error cases in the return value.
Another alternative would have been std::expected.
* Source/WebCore/html/parser/HTMLTokenizer.cpp:
(WebCore::HTMLTokenizer::processEntity): Updated for changes to consumeHTMLEntity.
(WebCore::HTMLTokenizer::processToken): Ditto.
* Source/WebCore/xml/parser/CharacterReferenceParserInlines.h: Removed.
* Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:
(WebCore::convertUTF16EntityToUTF8): Updated to use std::span.
(WebCore::getXHTMLEntity): Updated for decodeNamedHTMLEntityForXMLParser.
Canonical link: https://commits.webkit.org/264675@main
More information about the webkit-changes
mailing list