[webkit-dev] We're doing JPEG parallel decoding, wan't to contribute code to Webkit

Ryosuke Niwa rniwa at webkit.org
Thu Nov 22 18:46:31 PST 2012

See the following two threads:

In particular, https://bugs.webkit.org/show_bug.cgi?id=90375 appears to
have a work-in-progress patch.

- R. Niwa

On Thu, Nov 22, 2012 at 6:31 PM, Zhang Peixuan <zhangpeixuan.cn at gmail.com>wrote:

> Hello,
>      We are writing a GPU-based libjpeg-turbo accelerated version and our
> goal is to use it in Chrome or other browsers that using Webkit. Now we
> have written a beta version, that can use GPU to decode JPEG files in
> Chromium.
>      We still have a lot of work to do. And I have known from the Chromium
> community that there is also an effort underway in WebKit to generalize the
> concept of parallel/asynchronous image decoding.
>      So I wan't to know whether we could contribute code?
>      Thanks a lot.
>              Peixuan Zhang
>                 20121123
> The following is the email that I have sent to Chromium community.
> ===================================
> Hello,
> I'm a programmer, and my team and I are writing a GPU-based libjpeg-turbo
> accelerated version, and we mainly use OpenCL. Our goal is to use it in
> Chrome. Now we have written a beta version, that can use GPU to decode JPEG
> files in Chromium.
>  However, because we need to load the some additional .dll files and API
> (e.g. we must load OpenCL.dll), this version must run with the parameter
> --no-sandbox.
> We don't know how to run it without no-sandbox, so I really want to know
> how to load additional .dll files and access some information of the
> registry in sandbox. Is there some way to do it?
>  In addition, because we need to do some initialization before using
> OpenCL, while Chrome is a multi-process application, so it needs to do
> initialization work in each process, which increases the time consumption.
> We have put forward several ideas, and my workmate Peng Xiao has discussed
> with you in this community. But after some discussion, we thought that
> these ideas may not be suitable.
> Therefore, we have proposed another solution, if we use a separate process
> to deal with jpeg decoder, We won't need to do multiple initialization
> work. I think it just like the process of "--type=gpu-process". We could
> decode image using a single process.
>  We learned that Chrome run JPEG decoder in sandbox maybe because safety
> factors, so we don't know if we run all JPEG decoder in one process,
> whether it will bring security risks? Or whether it will bring other
> problems? Because this step of the work is still in the conceptual stage,
> we do not know whether it is worthwhile to go ahead.
>  Yours sincerely.
> =============================================================
> 1. Do you have timing information about how jpeg decoding is a bottleneck
> at the moment? How much % of time is spent in jpeg decoding on rendering?
> According to the libjpeg-turbo-OpenCL that we have already completed, the
> performance is a little good than the original version. Of course, we only
> tested independent libjpeg-turbo, and there may be some differences in
> Chrome.
> We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version
> is 20~70% faster than before (the performance due to image size and
> sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M
> 2.9GHz.
> Of course, in many cases the JPEG codec is not the most time-consuming
> things in browsers, but with the popularity of HTML5, the picture codec's
> proportion will be more and more. e.g. There are many JPEG textures in
> WebGL.
>  2. Do you plan to use OpenCL for other things than jpeg decoding?
> Yes, we do have more plan that use OpenCL to accelerate some of the
> features in Chrome, what we're doing at least include JPEG and FFMpeg, in
> the future we may do more work on image and video.
>  3. Do you have an idea about the latency introduced by doing that, plus
> the kernel overhead, compared to a completely user-mode solution? There are
> several context switches introduced that would add a constant time to
> decoding an image, which severely affect smaller images. Is it worth
> sending 500 bytes of data to the GPU to be decoded? I don't think so.
> Yes, we have some ideas that can reduce the transmission time between CPU
> and GPU, and we also try to reduce the time of kernel overhead, some of
> these ideas have matured, but we are waiting for its open source.
>  4. The sandbox bypass is a non-starter. Adding yet-another-process is a
> non-starter too. Having a new jpeg decoder significantly increases the
> attack surface so just from a security perspective, I'm not sure it's worth.
> It's very important, if it would bring high risk of safe, the value of the
> work is low.
>  5. Do you have an idea how to do the runtime trade off when it's worth
> doing a software-only decoding versus offloading to the GPU? What if the
> user has its GPU already saturated but its CPU idle? At the extreme end,
> let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic
> cards with 2 30" monitors plugged in.
> OK, We are concerned about the different things, I think on AMD trinity
> APU, there may not be such problems, for Intel, I think I need to do some
> additional research.
>  In addition to what Marc-Antoine said, note that there is also an effort
> underway in WebKit to generalize the concept of parallel/asynchronous image
> decoding. You probably want to sync up with that effort to see what
> overlaps.
> Thanks a lot, I will send email to WebKit community for more infomation.
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20121122/7302de9f/attachment.html>

More information about the webkit-dev mailing list