[webkit-dev] We're doing JPEG parallel decoding, wan't to contribute code to Webkit

Thu Nov 22 18:31:50 PST 2012

Hello,
     We are writing a GPU-based libjpeg-turbo accelerated version and our
goal is to use it in Chrome or other browsers that using Webkit. Now we
have written a beta version, that can use GPU to decode JPEG files in
Chromium.
     We still have a lot of work to do. And I have known from the Chromium
community that there is also an effort underway in WebKit to generalize the
concept of parallel/asynchronous image decoding.
     So I wan't to know whether we could contribute code?

     Thanks a lot.

             Peixuan Zhang
                20121123

The following is the email that I have sent to Chromium community.

===================================

Hello,

I'm a programmer, and my team and I are writing a GPU-based libjpeg-turbo
accelerated version, and we mainly use OpenCL. Our goal is to use it in
Chrome. Now we have written a beta version, that can use GPU to decode JPEG
files in Chromium.

 However, because we need to load the some additional .dll files and API
(e.g. we must load OpenCL.dll), this version must run with the parameter
--no-sandbox.

We don't know how to run it without no-sandbox, so I really want to know
how to load additional .dll files and access some information of the
registry in sandbox. Is there some way to do it?

 In addition, because we need to do some initialization before using
OpenCL, while Chrome is a multi-process application, so it needs to do
initialization work in each process, which increases the time consumption.
We have put forward several ideas, and my workmate Peng Xiao has discussed
with you in this community. But after some discussion, we thought that
these ideas may not be suitable.

Therefore, we have proposed another solution, if we use a separate process
to deal with jpeg decoder, We won't need to do multiple initialization
work. I think it just like the process of "--type=gpu-process". We could
decode image using a single process.

 We learned that Chrome run JPEG decoder in sandbox maybe because safety
factors, so we don't know if we run all JPEG decoder in one process,
whether it will bring security risks? Or whether it will bring other
problems? Because this step of the work is still in the conceptual stage,
we do not know whether it is worthwhile to go ahead.

Yours sincerely.
=============================================================

1. Do you have timing information about how jpeg decoding is a bottleneck
at the moment? How much % of time is spent in jpeg decoding on rendering?

According to the libjpeg-turbo-OpenCL that we have already completed, the
performance is a little good than the original version. Of course, we only
tested independent libjpeg-turbo, and there may be some differences in
Chrome.

We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version is
20~70% faster than before (the performance due to image size and sampling
ratio). And for some case, it's even 8% faster than Intel i7-3520M 2.9GHz.

Of course, in many cases the JPEG codec is not the most time-consuming
things in browsers, but with the popularity of HTML5, the picture codec's
proportion will be more and more. e.g. There are many JPEG textures in
WebGL.

2. Do you plan to use OpenCL for other things than jpeg decoding?

Yes, we do have more plan that use OpenCL to accelerate some of the
features in Chrome, what we're doing at least include JPEG and FFMpeg, in
the future we may do more work on image and video.

3. Do you have an idea about the latency introduced by doing that, plus the
kernel overhead, compared to a completely user-mode solution? There are
several context switches introduced that would add a constant time to
decoding an image, which severely affect smaller images. Is it worth
sending 500 bytes of data to the GPU to be decoded? I don't think so.

Yes, we have some ideas that can reduce the transmission time between CPU
and GPU, and we also try to reduce the time of kernel overhead, some of
these ideas have matured, but we are waiting for its open source.

4. The sandbox bypass is a non-starter. Adding yet-another-process is a
non-starter too. Having a new jpeg decoder significantly increases the
attack surface so just from a security perspective, I'm not sure it's worth.

It's very important, if it would bring high risk of safe, the value of the
work is low.

5. Do you have an idea how to do the runtime trade off when it's worth
doing a software-only decoding versus offloading to the GPU? What if the
user has its GPU already saturated but its CPU idle? At the extreme end,
let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic
cards with 2 30" monitors plugged in.

OK, We are concerned about the different things, I think on AMD trinity
APU, there may not be such problems, for Intel, I think I need to do some
additional research.

In addition to what Marc-Antoine said, note that there is also an effort
underway in WebKit to generalize the concept of parallel/asynchronous image
decoding. You probably want to sync up with that effort to see what
overlaps.

Thanks a lot, I will send email to WebKit community for more infomation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20121123/4b536485/attachment.html>