[Webkit-unassigned] [Bug 273435] New: PDF.js contains binary code

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Apr 29 14:26:00 PDT 2024


https://bugs.webkit.org/show_bug.cgi?id=273435

            Bug ID: 273435
           Summary: PDF.js contains binary code
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: PDF
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: mcatanzaro at redhat.com
                CC: thorton at apple.com

We should update to PDF.js v4.2.67.

However, this is going to be more complicated than usual because this release contains a binary wasm module for processing JPEG 2000 images: https://github.com/mozilla/pdf.js/pull/17946/files#diff-0c3dc243c4697cac89b08327394744c65a35d3ff8f7e0badcd98f40229c7e2cb. I think that goes too far and we don't want it. JPEG 2000 image format is old and obscure, and I think it's OK to not fully support PDFs that use it.

It's bad enough that we ship prebuilt JS files instead of the preferred modifiable source, but at least those are still human-readable and not minified or obfuscated or anything (and we can't do much about it, because the build process depends on a large number of node.js modules that we surely don't want WebKit to depend on). Even though probably nobody will ever inspect the built JS code to see what it's doing, at least it's *possible* to do so. That's no longer true if we ship binary code compiled by somebody else. (Interestingly, it *looks* like a text file rather than a binary, because it's base64 encoded and embedded into pdf.worker.mjs.)

Anyway, removing the code that uses the wasm should be easy enough. We can add a patch to sabotage it:

static decode(data, ignoreColorSpace) {
    this.#module ||= OpenJPEG();
    const imageData = this.#module.decode(data, ignoreColorSpace);
    if (!imageData) {
      throw new JpxError("JPX decode failed");
    }
    return imageData;
  }

We'll also need a script to automatically remove the jpx.js source section from pdf.worker.mjs. (The same script could apply the patch and maybe run some of the other steps that are currently manual.)

I thought the above was a good plan, but then I decided to check to be sure there isn't more wasm binary content in pdf.js. Unfortunately there is, it's quickjs-eval.js which is required to implement the PDF.js subsandbox, so removing it would have security implications. :S I'm pretty sure it's unacceptable to have this in WebKit, but we might have to discuss this with upstream before deciding what to do. One possibility is to actually depend on node.js, which would be sad.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20240429/af720189/attachment-0001.htm>


More information about the webkit-unassigned mailing list