[webkit-changes] [WebKit/WebKit] 569a8b: Implement basic infrastructure to extract (primari...
Wenson Hsieh
noreply at github.com
Fri Jan 26 22:58:37 PST 2024
Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 569a8bd61f85e335544610a543eb86b808b09fa4
https://github.com/WebKit/WebKit/commit/569a8bd61f85e335544610a543eb86b808b09fa4
Author: Wenson Hsieh <wenson_hsieh at apple.com>
Date: 2024-01-26 (Fri, 26 Jan 2024)
Changed paths:
M Source/WebCore/CMakeLists.txt
M Source/WebCore/Headers.cmake
M Source/WebCore/Sources.txt
M Source/WebCore/WebCore.xcodeproj/project.pbxproj
M Source/WebCore/page/Page.cpp
A Source/WebCore/page/text-extraction/TextExtraction.cpp
A Source/WebCore/page/text-extraction/TextExtraction.h
A Source/WebCore/page/text-extraction/TextExtractionTypes.h
M Source/WebKit/Scripts/webkit/messages.py
M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in
M Source/WebKit/UIProcess/WebPageProxy.cpp
M Source/WebKit/UIProcess/WebPageProxy.h
M Source/WebKit/WebProcess/WebPage/WebPage.cpp
M Source/WebKit/WebProcess/WebPage/WebPage.h
M Source/WebKit/WebProcess/WebPage/WebPage.messages.in
Log Message:
-----------
Implement basic infrastructure to extract (primarily text) content from webpages
https://bugs.webkit.org/show_bug.cgi?id=268171
rdar://121132162
Reviewed by Aditya Keerthi.
Add some infrastructure to WebKit, to extract visible text from web content for (eventual) donation
to system services. No change in behavior (yet).
* Source/WebCore/CMakeLists.txt:
* Source/WebCore/Headers.cmake:
* Source/WebCore/Sources.txt:
* Source/WebCore/WebCore.xcodeproj/project.pbxproj:
* Source/WebCore/page/Page.cpp:
* Source/WebCore/page/text-extraction/TextExtraction.cpp: Added.
(WebCore::TextExtraction::collectText):
Add a utility function to collect text over the entire document, and then recursively walk the DOM
to collect any other elements that are interesting for the purposes of text extraction; note that
this skips subframes for the time being, and doesn't handle `RemoteFrame`. Support will be added in
subsequent patches.
(WebCore::TextExtraction::shouldIncludeChildren):
(WebCore::TextExtraction::rootViewBounds):
(WebCore::TextExtraction::extractItemData):
(WebCore::TextExtraction::extractRecursive):
(WebCore::TextExtraction::extractItem):
* Source/WebCore/page/text-extraction/TextExtraction.h: Added.
* Source/WebCore/page/text-extraction/TextExtractionTypes.h: Added.
* Source/WebKit/Scripts/webkit/messages.py:
(headers_for_type):
* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:
* Source/WebKit/UIProcess/WebPageProxy.cpp:
(WebKit::WebPageProxy::requestTextExtraction):
Add an unused `WebPageProxy` method and IPC endpoint for now that will be adopted in `WebViewImpl`
and `WKContentView` in subsequent patches to vend collected items to system services.
* Source/WebKit/UIProcess/WebPageProxy.h:
* Source/WebKit/WebProcess/WebPage/WebPage.cpp:
(WebKit::WebPage::requestTextExtraction):
* Source/WebKit/WebProcess/WebPage/WebPage.h:
* Source/WebKit/WebProcess/WebPage/WebPage.messages.in:
Canonical link: https://commits.webkit.org/273598@main
More information about the webkit-changes
mailing list