[Webkit-unassigned] [Bug 273435] New: PDF.js contains binary code
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Apr 29 14:26:00 PDT 2024
https://bugs.webkit.org/show_bug.cgi?id=273435
Bug ID: 273435
Summary: PDF.js contains binary code
Product: WebKit
Version: WebKit Nightly Build
Hardware: PC
OS: Linux
Status: NEW
Severity: Normal
Priority: P2
Component: PDF
Assignee: webkit-unassigned at lists.webkit.org
Reporter: mcatanzaro at redhat.com
CC: thorton at apple.com
We should update to PDF.js v4.2.67.
However, this is going to be more complicated than usual because this release contains a binary wasm module for processing JPEG 2000 images: https://github.com/mozilla/pdf.js/pull/17946/files#diff-0c3dc243c4697cac89b08327394744c65a35d3ff8f7e0badcd98f40229c7e2cb. I think that goes too far and we don't want it. JPEG 2000 image format is old and obscure, and I think it's OK to not fully support PDFs that use it.
It's bad enough that we ship prebuilt JS files instead of the preferred modifiable source, but at least those are still human-readable and not minified or obfuscated or anything (and we can't do much about it, because the build process depends on a large number of node.js modules that we surely don't want WebKit to depend on). Even though probably nobody will ever inspect the built JS code to see what it's doing, at least it's *possible* to do so. That's no longer true if we ship binary code compiled by somebody else. (Interestingly, it *looks* like a text file rather than a binary, because it's base64 encoded and embedded into pdf.worker.mjs.)
Anyway, removing the code that uses the wasm should be easy enough. We can add a patch to sabotage it:
static decode(data, ignoreColorSpace) {
this.#module ||= OpenJPEG();
const imageData = this.#module.decode(data, ignoreColorSpace);
if (!imageData) {
throw new JpxError("JPX decode failed");
}
return imageData;
}
We'll also need a script to automatically remove the jpx.js source section from pdf.worker.mjs. (The same script could apply the patch and maybe run some of the other steps that are currently manual.)
I thought the above was a good plan, but then I decided to check to be sure there isn't more wasm binary content in pdf.js. Unfortunately there is, it's quickjs-eval.js which is required to implement the PDF.js subsandbox, so removing it would have security implications. :S I'm pretty sure it's unacceptable to have this in WebKit, but we might have to discuss this with upstream before deciding what to do. One possibility is to actually depend on node.js, which would be sad.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20240429/af720189/attachment-0001.htm>
More information about the webkit-unassigned
mailing list