[webkit-changes] [WebKit/WebKit] 569a8b: Implement basic infrastructure to extract (primari...

Wenson Hsieh noreply at github.com
Fri Jan 26 22:58:37 PST 2024


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 569a8bd61f85e335544610a543eb86b808b09fa4
      https://github.com/WebKit/WebKit/commit/569a8bd61f85e335544610a543eb86b808b09fa4
  Author: Wenson Hsieh <wenson_hsieh at apple.com>
  Date:   2024-01-26 (Fri, 26 Jan 2024)

  Changed paths:
    M Source/WebCore/CMakeLists.txt
    M Source/WebCore/Headers.cmake
    M Source/WebCore/Sources.txt
    M Source/WebCore/WebCore.xcodeproj/project.pbxproj
    M Source/WebCore/page/Page.cpp
    A Source/WebCore/page/text-extraction/TextExtraction.cpp
    A Source/WebCore/page/text-extraction/TextExtraction.h
    A Source/WebCore/page/text-extraction/TextExtractionTypes.h
    M Source/WebKit/Scripts/webkit/messages.py
    M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in
    M Source/WebKit/UIProcess/WebPageProxy.cpp
    M Source/WebKit/UIProcess/WebPageProxy.h
    M Source/WebKit/WebProcess/WebPage/WebPage.cpp
    M Source/WebKit/WebProcess/WebPage/WebPage.h
    M Source/WebKit/WebProcess/WebPage/WebPage.messages.in

  Log Message:
  -----------
  Implement basic infrastructure to extract (primarily text) content from webpages
https://bugs.webkit.org/show_bug.cgi?id=268171
rdar://121132162

Reviewed by Aditya Keerthi.

Add some infrastructure to WebKit, to extract visible text from web content for (eventual) donation
to system services. No change in behavior (yet).

* Source/WebCore/CMakeLists.txt:
* Source/WebCore/Headers.cmake:
* Source/WebCore/Sources.txt:
* Source/WebCore/WebCore.xcodeproj/project.pbxproj:
* Source/WebCore/page/Page.cpp:
* Source/WebCore/page/text-extraction/TextExtraction.cpp: Added.
(WebCore::TextExtraction::collectText):

Add a utility function to collect text over the entire document, and then recursively walk the DOM
to collect any other elements that are interesting for the purposes of text extraction; note that
this skips subframes for the time being, and doesn't handle `RemoteFrame`. Support will be added in
subsequent patches.

(WebCore::TextExtraction::shouldIncludeChildren):
(WebCore::TextExtraction::rootViewBounds):
(WebCore::TextExtraction::extractItemData):
(WebCore::TextExtraction::extractRecursive):
(WebCore::TextExtraction::extractItem):
* Source/WebCore/page/text-extraction/TextExtraction.h: Added.
* Source/WebCore/page/text-extraction/TextExtractionTypes.h: Added.
* Source/WebKit/Scripts/webkit/messages.py:
(headers_for_type):
* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:
* Source/WebKit/UIProcess/WebPageProxy.cpp:
(WebKit::WebPageProxy::requestTextExtraction):

Add an unused `WebPageProxy` method and IPC endpoint for now that will be adopted in `WebViewImpl`
and `WKContentView` in subsequent patches to vend collected items to system services.

* Source/WebKit/UIProcess/WebPageProxy.h:
* Source/WebKit/WebProcess/WebPage/WebPage.cpp:
(WebKit::WebPage::requestTextExtraction):
* Source/WebKit/WebProcess/WebPage/WebPage.h:
* Source/WebKit/WebProcess/WebPage/WebPage.messages.in:

Canonical link: https://commits.webkit.org/273598@main




More information about the webkit-changes mailing list