[webkit-dev] Request for position: Topics API

Wed Apr 6 12:18:43 PDT 2022

Hi Josh!

Thanks for reaching out. Hope to see you in person at some standards meeting soon! Please see our feedback on your proposal below.

   Regards, John

> On Mar 17, 2022, at 9:04 AM, Josh Karlin via webkit-dev <webkit-dev at lists.webkit.org> wrote:
> 
> Hi WebKit-Dev,
> 
> We've been working on the Topics API that allows for interest-based advertising in a browser ecosystem in which storage is partitioned by top-frame site. This API replaces our first proposal in this area, FLoC. We would like to hear what you think about it. Note that Chrome is implementing (with spec following shortly after) but we're quite open to evolving the API over time and are appreciative of your feedback.
> 
> explainer: https://github.com/jkarlin/topics <https://github.com/jkarlin/topics>
> chromestatus: https://chromestatus.com/feature/5680923054964736 <https://chromestatus.com/feature/5680923054964736>
> spec: TBD

The Topics API explainer is in a personal repository which gives us pause on commenting since it’s unclear what the proposal’s official status is.

Our analysis of the proposal assumes full per-site partitioning and no high entropy device fingerprinting such as IP address available cross-site. It’s important that any pre-existing privacy deficiencies on the web not be used as excuses for privacy deficiencies in new specs and proposals.

Apple does not think Topics API is a good addition to the web platform. Here’s why:
Cross-site data. We don’t think cross-site data about the user’s browsing should be exposed in APIs. We’ve been working for ten years in the opposite direction, partitioning data per-site.
Cross-site sharing default. We don’t think cross-site data sharing should be on by default as a web platform feature. Users must have agency over expressing their personal interests to websites and third parties. A browser exposing this data by default is not acting as a user agent. Further, using the user’s browsing history as the basis of determining interests undermines users’ trust in the browser as their agent.
Cross-site targeting by default. We don’t think cross-site targeting of ads should be on by default as a web platform feature. Put another way, we don’t think cross-site targeting of ads should be the default experience on the web.
Safe to roam. The web should be safe to roam and the user agent should be working in that direction. By default exposing cross-site data to facilitate personalized ad targeting would make the web less safe to roam. Users would have to always think twice about which sites they visit and how that can be used to manipulate or target them.
Enrichment of user profiles. Websites which already know a lot about a user can learn more through cross-site data APIs like Topics API. Prime examples of such sites are the user’s search engine or social networking sites. Worse, topics connected to the user’s browsing will evolve over time, allowing continuous enrichment of the user profile as an ongoing privacy exposure. An example: The user was interested in honeymoons, then baby clothing, then lawyers.
Sensitive topics. What’s sensitive information differs between for instance cultures, religions, ages, communities, and individuals. It is therefore not just hard but also foolish to think that browser vendors can come up with a safe set of personalized topics to expose to ad networks.
Topic bias. We understand that the current set of topics is not the one intended to be used in production. However, the set shows a concerning affluent western lifestyle bias and we worry that the eventual standardized taxonomy will contain such biases too. A prime example in the current taxonomy is “World Music” as a term for all non-Western music.
Hidden patterns. We believe that technologies like machine learning will be able to glean personal data and patterns out of something like Topics API that go far beyond whatever “safe” set of topics that browser vendors define.
Advantages established players. The Topics API will only provide cross-site topic data to callers who called the API in the past for this particular user and on a site about that topic. This benefits entities that have scripts or frames embedded on many sites, e.g. already prevalent ad trackers, or owners of embeds with an ostensibly non-ad-related purpose such as social or video. And it perpetuates the incentive for more embedding solely for the purpose of cross-site data usage and not for any clear user benefit, thus needlessly hurting performance and battery life.
Who will classify sites? The open questions at the end of the explainer suggest that a taxonomy should be produced, and that it should become an industry standard. A sample taxonomy is available. But a taxonomy (at least as presented) is just a list of categories. Who decides which sites or pages are in which category? Is this a globally maintained list? Would it be a Google-provided service that requires Google’s permission to access? Would each browser do it separately (and perhaps differently?) Would sites self-label? Perhaps the intent is that the industry standard taxonomy would bucket sites or pages in the categories, but if so that’s not clear from the explainer, and if not, it seems like a major problem left unaddressed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20220406/87ab07bd/attachment.htm>