[webkit-changes] [WebKit/WebKit] 4bb58f: Augment text extraction support to include informa...

Wenson Hsieh noreply at github.com
Mon Feb 19 18:15:51 PST 2024


  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 4bb58f66236e7ff4fe1ad111373ad063c5a56b2e
      https://github.com/WebKit/WebKit/commit/4bb58f66236e7ff4fe1ad111373ad063c5a56b2e
  Author: Wenson Hsieh <wenson_hsieh at apple.com>
  Date:   2024-02-19 (Mon, 19 Feb 2024)

  Changed paths:
    M LayoutTests/TestExpectations
    A LayoutTests/fast/text-extraction/basic-text-extraction-expected.txt
    A LayoutTests/fast/text-extraction/basic-text-extraction.html
    M LayoutTests/platform/ios-wk2/TestExpectations
    M LayoutTests/platform/mac-wk2/TestExpectations
    M LayoutTests/resources/ui-helper.js
    M Source/WebCore/page/text-extraction/TextExtraction.cpp
    M Source/WebCore/page/text-extraction/TextExtractionTypes.h
    M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in
    M Source/WebKit/UIProcess/Cocoa/WKTextExtractionItem.h
    M Source/WebKit/UIProcess/Cocoa/WKTextExtractionUtilities.mm
    M Source/WebKit/UIProcess/Cocoa/WebKitSwiftSoftLink.h
    M Source/WebKit/UIProcess/Cocoa/WebKitSwiftSoftLink.mm
    M Source/WebKit/WebKitSwift/TextExtraction/WKTextExtractionItem.swift
    M Tools/TestRunnerShared/UIScriptContext/Bindings/UIScriptController.idl
    M Tools/TestRunnerShared/UIScriptContext/UIScriptContext.cpp
    M Tools/TestRunnerShared/UIScriptContext/UIScriptContext.h
    M Tools/TestRunnerShared/UIScriptContext/UIScriptController.h
    M Tools/WebKitTestRunner/Configurations/Base.xcconfig
    M Tools/WebKitTestRunner/TestOptions.cpp
    M Tools/WebKitTestRunner/TestOptions.h
    M Tools/WebKitTestRunner/WebKitTestRunner.xcodeproj/project.pbxproj
    M Tools/WebKitTestRunner/cocoa/TestControllerCocoa.mm
    M Tools/WebKitTestRunner/cocoa/UIScriptControllerCocoa.h
    M Tools/WebKitTestRunner/cocoa/UIScriptControllerCocoa.mm
    A Tools/WebKitTestRunner/cocoa/WKTextExtractionTestingHelpers.h
    A Tools/WebKitTestRunner/cocoa/WKTextExtractionTestingHelpers.mm

  Log Message:
  -----------
  Augment text extraction support to include information about the selection, links and editable elements
https://bugs.webkit.org/show_bug.cgi?id=269682
rdar://123201244

Reviewed by Aditya Keerthi.

Extend support for text extraction to match the latest version of the corresponding system APIs. See
comments below for more details.

Test: fast/text-extraction/basic-text-extraction.html

* LayoutTests/TestExpectations:
* LayoutTests/fast/text-extraction/basic-text-extraction-expected.txt: Added.
* LayoutTests/fast/text-extraction/basic-text-extraction.html: Added.

Add a new layout test to exercise this logic (enabled on WebKit2 macOS and iOS only). This test
includes exercises all of the different item subclasses, as well as some more interesting scenarios:
(1) text, links and images nested inside of other structural DOM elements (lists, list items), and
(2) links inside of contenteditable elements.

* LayoutTests/platform/ios-wk2/TestExpectations:
* LayoutTests/platform/mac-wk2/TestExpectations:
* LayoutTests/resources/ui-helper.js:
(window.UIHelper.requestTextExtraction):
(window.UIHelper):

Add a new `UIHelper` method to dump the text extraction tree as text, for testing purposes.

* Source/WebCore/page/text-extraction/TextExtraction.cpp:

Make several changes to the underlying model objects:

1.  For the container types, remove `Link` (now that it's represented purely as text).

2.  Also, turn `Button` into a container type and remove `InteractiveItemData`; this is because
    there is not currently a system type that represents buttons/interactive elements.

3.  Finally, revamp how `TextItemData` works:
    a.  Treat all links as ranges in text, represented by a list of `(URL, Range)` per text item.
    b.  Add an optional `selectedRange` member that represents what part of the text item is in the
        user's current selection.
    c.  Remove `EditableItemData` below, and instead add an optional `editable` member to each
        text item that contains metadata about the editing host.

To match platform expectations, we now merge adjacent text items together during traversal, with
links and selection ranges embedded inside of text items. When encountering an editable container or
host (e.g. `input`, or contenteditable `div`) we now *only* extract text items underneath that
subtree, using the aforementioned coalescing mechanism to ensure that all text items in the subtree
get appended to a single item with the `editable` member. We apply a similar treatment to links as
well, extracting only text underneath them.

(WebCore::TextExtraction::collectText):

Refactor this `collectText` method, such that it now returns a `HashMap` of all text nodes with
visible text in the document, as well as ranges in the visible text that are selected by the user.
To achieve this, if there is a ranged selection in the document, we'll divide the text iteration
into three phases: (1) collecting text before the start of the selection, (2) collecting text inside
of the selection, and (3) collecting text after the end of the selection. These three results are
then stitched together to produce the above `HashMap` with per-text-node selection ranges.

(WebCore::TextExtraction::TraversalContext::shouldIncludeNodeWithRect const):

Add a `TraversalContext` struct to help encapsulate tree state while traversing the DOM to build up
text extraction results. This now includes `onlyCollectTextAndLinksCount`, which is set when
traversing into links or editable containers.

(WebCore::TextExtraction::canMerge):
(WebCore::TextExtraction::merge):

Add helper methods to merge text nodes into adjacent nodes.

(WebCore::TextExtraction::labelText):
(WebCore::TextExtraction::extractItemData):
(WebCore::TextExtraction::extractRecursive):
(WebCore::TextExtraction::pruneRedundantItemsRecursive):

To avoid emitting too many spurious white space nodes, add one final pass through the extraction
results before returning it, to prune all non-editable text nodes that only contain whitespace.

(WebCore::TextExtraction::extractItem):
(WebCore::TextExtraction::shouldIncludeChildren): Deleted.
* Source/WebCore/page/text-extraction/TextExtractionTypes.h:

See above for more details.

* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:
* Source/WebKit/UIProcess/Cocoa/WKTextExtractionItem.h:
* Source/WebKit/UIProcess/Cocoa/WKTextExtractionUtilities.mm:
(WebKit::containerType):
(WebKit::createWKTextItem):
(WebKit::createItemWithChildren):
(WebKit::createItemRecursive):
(WebKit::createItemIgnoringChildren): Deleted.
* Source/WebKit/UIProcess/Cocoa/WebKitSwiftSoftLink.h:
* Source/WebKit/UIProcess/Cocoa/WebKitSwiftSoftLink.mm:

Update the Swift and ObjC wrappers to match the changes in `TextExtractionTypes.h`.

* Source/WebKit/WebKitSwift/TextExtraction/WKTextExtractionItem.swift:
(WKTextExtractionItem.children): Deleted.
* Tools/TestRunnerShared/UIScriptContext/Bindings/UIScriptController.idl:
* Tools/TestRunnerShared/UIScriptContext/UIScriptContext.cpp:
(UIScriptContext::asyncTaskComplete):

Add an optional `std::initializer_list<JSValueRef>` argument here, so that we can pass JS values as
arguments back to test runner callbacks.

* Tools/TestRunnerShared/UIScriptContext/UIScriptContext.h:
(WTR::UIScriptContext::asyncTaskComplete):
* Tools/TestRunnerShared/UIScriptContext/UIScriptController.h:
(WTR::UIScriptController::requestTextExtraction):
* Tools/WebKitTestRunner/Configurations/Base.xcconfig:

Add `WebKit/UIProcess/Cocoa` to the header search paths, so that we can include
`WKTextExtractionItems.h` in API tests.

* Tools/WebKitTestRunner/TestOptions.cpp:
(WTR::TestOptions::defaults):
(WTR::TestOptions::keyTypeMapping):
* Tools/WebKitTestRunner/TestOptions.h:
(WTR::TestOptions::textExtractionEnabled const):

Add a test option to enable the `_textExtractionEnabled` SPI preference.

* Tools/WebKitTestRunner/WebKitTestRunner.xcodeproj/project.pbxproj:
* Tools/WebKitTestRunner/cocoa/TestControllerCocoa.mm:
(WTR::TestController::cocoaResetStateToConsistentValues):
* Tools/WebKitTestRunner/cocoa/UIScriptControllerCocoa.h:
* Tools/WebKitTestRunner/cocoa/UIScriptControllerCocoa.mm:
(WTR::UIScriptControllerCocoa::requestTextExtraction):
* Tools/WebKitTestRunner/cocoa/WKTextExtractionTestingHelpers.h: Copied from Source/WebKit/UIProcess/Cocoa/WebKitSwiftSoftLink.h.
* Tools/WebKitTestRunner/cocoa/WKTextExtractionTestingHelpers.mm: Added.
(WTR::description):
(WTR::buildDescriptionIgnoringChildren):
(WTR::buildRecursiveDescription):
(WTR::recursiveDescription):

Add support for the new text extraction layout test; see above.

Canonical link: https://commits.webkit.org/275013@main



To unsubscribe from these emails, change your notification settings at https://github.com/WebKit/WebKit/settings/notifications


More information about the webkit-changes mailing list