<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[287592] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/287592">287592</a></dd>
<dt>Author</dt> <dd>wenson_hsieh@apple.com</dd>
<dt>Date</dt> <dd>2022-01-04 15:31:23 -0800 (Tue, 04 Jan 2022)</dd>
</dl>

<h3>Log Message</h3>
<pre>Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
https://bugs.webkit.org/show_bug.cgi?id=234677

Reviewed by Tim Horton.

Source/WebKit:

Followup to <a href="http://trac.webkit.org/projects/webkit/changeset/287420">r287420</a> - use ICU to check for more strings that resemble either the lowercase or uppercase letter
"x", rather than relying on a hard-coded set of symbols. Note that ICU's "confusables" list currently does not
consider both ✖ and ✕ to be lookalikes to the letter "x"; since these symbols are actually known to appear in
modal containers on several websites, we'll still need to check for these two symbols separately.

Test: ModalContainerObservation.ClassifyMultiplySymbol

* UIProcess/Cocoa/ModalContainerControlClassifier.mm:
(WebKit::SpoofChecker::~SpoofChecker):

Add a helper class that wraps calls to `uspoof_areConfusableUTF8`, and also ensures balanced calls to
`uspoof_open` and `uspoof_close` when creating a new ICU spoof checker. Use this in the
WKModalContainerClassifierInput class below to check for more types of strings that look like the letter "x".

(WebKit::SpoofChecker::areConfusable):
(WebKit::SpoofChecker::checker):
(-[WKModalContainerClassifierInput initWithTokenizer:rawInput:]):

Tools:

Augment the existing API test so that it additionally tests a symbol ("small roman numeral ten") that would not
have been covered as one of the three hard-coded symbol strings in the earlier fix.

* TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm:
(TestWebKitAPI::TEST):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWebKitChangeLog">trunk/Source/WebKit/ChangeLog</a></li>
<li><a href="#trunkSourceWebKitUIProcessCocoaModalContainerControlClassifiermm">trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm</a></li>
<li><a href="#trunkToolsChangeLog">trunk/Tools/ChangeLog</a></li>
<li><a href="#trunkToolsTestWebKitAPITestsWebKitCocoaModalContainerObservationmm">trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWebKitChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/ChangeLog (287591 => 287592)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/ChangeLog    2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Source/WebKit/ChangeLog       2022-01-04 23:31:23 UTC (rev 287592)
</span><span class="lines">@@ -1,3 +1,28 @@
</span><ins>+2022-01-04  Wenson Hsieh  <wenson_hsieh@apple.com>
+
+        Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
+        https://bugs.webkit.org/show_bug.cgi?id=234677
+
+        Reviewed by Tim Horton.
+
+        Followup to r287420 - use ICU to check for more strings that resemble either the lowercase or uppercase letter
+        "x", rather than relying on a hard-coded set of symbols. Note that ICU's "confusables" list currently does not
+        consider both ✖ and ✕ to be lookalikes to the letter "x"; since these symbols are actually known to appear in
+        modal containers on several websites, we'll still need to check for these two symbols separately.
+
+        Test: ModalContainerObservation.ClassifyMultiplySymbol
+
+        * UIProcess/Cocoa/ModalContainerControlClassifier.mm:
+        (WebKit::SpoofChecker::~SpoofChecker):
+
+        Add a helper class that wraps calls to `uspoof_areConfusableUTF8`, and also ensures balanced calls to
+        `uspoof_open` and `uspoof_close` when creating a new ICU spoof checker. Use this in the
+        WKModalContainerClassifierInput class below to check for more types of strings that look like the letter "x".
+
+        (WebKit::SpoofChecker::areConfusable):
+        (WebKit::SpoofChecker::checker):
+        (-[WKModalContainerClassifierInput initWithTokenizer:rawInput:]):
+
</ins><span class="cx"> 2022-01-04  Per Arne Vollan  <pvollan@apple.com>
</span><span class="cx"> 
</span><span class="cx">         [iOS][WP] Add telemetry for syscall violations
</span></span></pre></div>
<a id="trunkSourceWebKitUIProcessCocoaModalContainerControlClassifiermm"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm (287591 => 287592)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm   2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm      2022-01-04 23:31:23 UTC (rev 287592)
</span><span class="lines">@@ -27,6 +27,8 @@
</span><span class="cx"> #import "ModalContainerControlClassifier.h"
</span><span class="cx"> 
</span><span class="cx"> #import <WebCore/ModalContainerTypes.h>
</span><ins>+#import <unicode/uspoof.h>
+
</ins><span class="cx"> #import <pal/cocoa/CoreMLSoftLink.h>
</span><span class="cx"> #import <pal/cocoa/NaturalLanguageSoftLink.h>
</span><span class="cx"> 
</span><span class="lines">@@ -74,6 +76,36 @@
</span><span class="cx"> 
</span><span class="cx"> @end
</span><span class="cx"> 
</span><ins>+namespace WebKit {
+
+class SpoofChecker {
+    WTF_MAKE_FAST_ALLOCATED;
+public:
+    ~SpoofChecker()
+    {
+        if (m_checker)
+            uspoof_close(m_checker);
+    }
+
+    bool areConfusable(NSString *potentialSpoofString, const char* stringToSpoof)
+    {
+        return checker() && uspoof_areConfusableUTF8(checker(), potentialSpoofString.UTF8String, -1, stringToSpoof, -1, &m_status);
+    }
+
+private:
+    USpoofChecker* checker()
+    {
+        if (!m_checker && m_status == U_ZERO_ERROR)
+            m_checker = uspoof_open(&m_status);
+        return m_checker;
+    }
+
+    UErrorCode m_status { U_ZERO_ERROR };
+    USpoofChecker* m_checker { nullptr };
+};
+
+} // namespace WebKit
+
</ins><span class="cx"> @implementation WKModalContainerClassifierInput {
</span><span class="cx">     RetainPtr<NSString> _canonicalInput;
</span><span class="cx"> }
</span><span class="lines">@@ -95,10 +127,11 @@
</span><span class="cx">             return;
</span><span class="cx"> 
</span><span class="cx">         if (attributes & (NLTokenizerAttributeSymbolic | NLTokenizerAttributeEmoji)) {
</span><del>-            // We should consider using a memory-compact hash map if we need to add a large number of entries here in the future.
-            // For now, we only make an exception for the following symbols, so simply checking each string is sufficient.
-            if ([lowercaseToken isEqualToString:@"×"] || [lowercaseToken isEqualToString:@"✕"] || [lowercaseToken isEqualToString:@"✖"])
</del><ins>+            WebKit::SpoofChecker checker;
+            if ([lowercaseToken isEqualToString:@"✕"] || [lowercaseToken isEqualToString:@"✖"] || checker.areConfusable(lowercaseToken, "x") || checker.areConfusable(lowercaseToken, "X")) {
+                // ICU does not consider two unicode symbols to be confusable with the letter x, but for the purposes of the classifier we need to treat them as if they were.
</ins><span class="cx">                 [tokens addObject:@"x"];
</span><ins>+            }
</ins><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkToolsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Tools/ChangeLog (287591 => 287592)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/ChangeLog    2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Tools/ChangeLog       2022-01-04 23:31:23 UTC (rev 287592)
</span><span class="lines">@@ -1,5 +1,18 @@
</span><span class="cx"> 2022-01-04  Wenson Hsieh  <wenson_hsieh@apple.com>
</span><span class="cx"> 
</span><ins>+        Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
+        https://bugs.webkit.org/show_bug.cgi?id=234677
+
+        Reviewed by Tim Horton.
+
+        Augment the existing API test so that it additionally tests a symbol ("small roman numeral ten") that would not
+        have been covered as one of the three hard-coded symbol strings in the earlier fix.
+
+        * TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm:
+        (TestWebKitAPI::TEST):
+
+2022-01-04  Wenson Hsieh  <wenson_hsieh@apple.com>
+
</ins><span class="cx">         ModalContainerObserver should search for text in subframes
</span><span class="cx">         https://bugs.webkit.org/show_bug.cgi?id=234446
</span><span class="cx">         rdar://86897770
</span></span></pre></div>
<a id="trunkToolsTestWebKitAPITestsWebKitCocoaModalContainerObservationmm"></a>
<div class="modfile"><h4>Modified: trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm (287591 => 287592)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm 2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm    2022-01-04 23:31:23 UTC (rev 287592)
</span><span class="lines">@@ -234,11 +234,16 @@
</span><span class="cx"> TEST(ModalContainerObservation, ClassifyMultiplySymbol)
</span><span class="cx"> {
</span><span class="cx">     auto webView = createModalContainerWebView();
</span><del>-    [webView loadBundlePage:@"modal-container-custom"];
-    [webView evaluate:@"show(`<p>Hello world</p><button>×</button>`)" andDecidePolicy:_WKModalContainerDecisionHideAndIgnore];
</del><ins>+    auto runTest = [&] (NSString *symbol) {
+        [webView loadBundlePage:@"modal-container-custom"];
+        NSString *scriptToEvaluate = [NSString stringWithFormat:@"show(`<p>Hello world</p><button>%@</button>`)", symbol];
+        [webView evaluate:scriptToEvaluate andDecidePolicy:_WKModalContainerDecisionHideAndIgnore];
</ins><span class="cx"> 
</span><del>-    EXPECT_FALSE([[webView contentsAsString] containsString:@"Hello world"]);
-    EXPECT_EQ([webView lastModalContainerInfo].availableTypes, _WKModalContainerControlTypeNeutral);
</del><ins>+        EXPECT_FALSE([[webView contentsAsString] containsString:@"Hello world"]);
+        EXPECT_EQ([webView lastModalContainerInfo].availableTypes, _WKModalContainerControlTypeNeutral);
+    };
+    runTest(@"✕");
+    runTest(@"⨯");
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> TEST(ModalContainerObservation, DetectSearchTermInBoldTag)
</span></span></pre>
</div>
</div>

</body>
</html>