[webkit-dev] We're doing JPEG parallel decoding, wan't to contribute code to Webkit

Thu Nov 22 23:10:39 PST 2012

Looks promising to me. However, as you mentioned, using gpu wastes constant
times. You have to handle trade-off carefully, because decoding small jpeg
is light.

You can work in parallel with Bug 90375 as Niwa mentioned. Bug 90375 tried
to change WebKit architecture to decode off the main thread. Your work
optimizes just decoding operations. Both approaches can get along with.
In addition, Bug 90375 was postponed, because this causes flashing images.
We think the solution will be general deferred image decoding on WebKit.

Alpha in chromium team works deferred image decoding on chromium in
progress.
https://bugs.webkit.org/show_bug.cgi?id=94240

You need to see. I think Alpha will decode an image off the main thread.
You should make the opencl jpeg decoder can be used on multi threads.

- Dongsung Huang

2012/11/23 Ryosuke Niwa <rniwa at webkit.org>

> See the following two threads:
> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021820.html
> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021734.html
>
> In particular, https://bugs.webkit.org/show_bug.cgi?id=90375 appears to
> have a work-in-progress patch.
>
> - R. Niwa
>
> On Thu, Nov 22, 2012 at 6:31 PM, Zhang Peixuan <zhangpeixuan.cn at gmail.com>wrote:
>
>> Hello,
>>      We are writing a GPU-based libjpeg-turbo accelerated version and our
>> goal is to use it in Chrome or other browsers that using Webkit. Now we
>> have written a beta version, that can use GPU to decode JPEG files in
>> Chromium.
>>      We still have a lot of work to do. And I have known from the
>> Chromium community that there is also an effort underway in WebKit to
>> generalize the concept of parallel/asynchronous image decoding.
>>      So I wan't to know whether we could contribute code?
>>
>>      Thanks a lot.
>>
>>              Peixuan Zhang
>>                 20121123
>>
>> The following is the email that I have sent to Chromium community.
>>
>> ===================================
>>
>> Hello,
>>
>> I'm a programmer, and my team and I are writing a GPU-based libjpeg-turbo
>> accelerated version, and we mainly use OpenCL. Our goal is to use it in
>> Chrome. Now we have written a beta version, that can use GPU to decode JPEG
>> files in Chromium.
>>
>>  However, because we need to load the some additional .dll files and API
>> (e.g. we must load OpenCL.dll), this version must run with the parameter
>> --no-sandbox.
>>
>> We don't know how to run it without no-sandbox, so I really want to know
>> how to load additional .dll files and access some information of the
>> registry in sandbox. Is there some way to do it?
>>
>>  In addition, because we need to do some initialization before using
>> OpenCL, while Chrome is a multi-process application, so it needs to do
>> initialization work in each process, which increases the time consumption.
>> We have put forward several ideas, and my workmate Peng Xiao has discussed
>> with you in this community. But after some discussion, we thought that
>> these ideas may not be suitable.
>>
>> Therefore, we have proposed another solution, if we use a separate
>> process to deal with jpeg decoder, We won't need to do multiple
>> initialization work. I think it just like the process of
>> "--type=gpu-process". We could decode image using a single process.
>>
>>  We learned that Chrome run JPEG decoder in sandbox maybe because safety
>> factors, so we don't know if we run all JPEG decoder in one process,
>> whether it will bring security risks? Or whether it will bring other
>> problems? Because this step of the work is still in the conceptual stage,
>> we do not know whether it is worthwhile to go ahead.
>>
>>  Yours sincerely.
>> =============================================================
>>
>> 1. Do you have timing information about how jpeg decoding is a bottleneck
>> at the moment? How much % of time is spent in jpeg decoding on rendering?
>>
>> According to the libjpeg-turbo-OpenCL that we have already completed, the
>> performance is a little good than the original version. Of course, we only
>> tested independent libjpeg-turbo, and there may be some differences in
>> Chrome.
>>
>> We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version
>> is 20~70% faster than before (the performance due to image size and
>> sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M
>> 2.9GHz.
>>
>> Of course, in many cases the JPEG codec is not the most time-consuming
>> things in browsers, but with the popularity of HTML5, the picture codec's
>> proportion will be more and more. e.g. There are many JPEG textures in
>> WebGL.
>>
>>  2. Do you plan to use OpenCL for other things than jpeg decoding?
>>
>> Yes, we do have more plan that use OpenCL to accelerate some of the
>> features in Chrome, what we're doing at least include JPEG and FFMpeg, in
>> the future we may do more work on image and video.
>>
>>  3. Do you have an idea about the latency introduced by doing that, plus
>> the kernel overhead, compared to a completely user-mode solution? There are
>> several context switches introduced that would add a constant time to
>> decoding an image, which severely affect smaller images. Is it worth
>> sending 500 bytes of data to the GPU to be decoded? I don't think so.
>>
>> Yes, we have some ideas that can reduce the transmission time between CPU
>> and GPU, and we also try to reduce the time of kernel overhead, some of
>> these ideas have matured, but we are waiting for its open source.
>>
>>  4. The sandbox bypass is a non-starter. Adding yet-another-process is a
>> non-starter too. Having a new jpeg decoder significantly increases the
>> attack surface so just from a security perspective, I'm not sure it's worth.
>>
>> It's very important, if it would bring high risk of safe, the value of
>> the work is low.
>>
>>  5. Do you have an idea how to do the runtime trade off when it's worth
>> doing a software-only decoding versus offloading to the GPU? What if the
>> user has its GPU already saturated but its CPU idle? At the extreme end,
>> let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic
>> cards with 2 30" monitors plugged in.
>>
>> OK, We are concerned about the different things, I think on AMD trinity
>> APU, there may not be such problems, for Intel, I think I need to do some
>> additional research.
>>
>>  In addition to what Marc-Antoine said, note that there is also an
>> effort underway in WebKit to generalize the concept of
>> parallel/asynchronous image decoding. You probably want to sync up with
>> that effort to see what overlaps.
>>
>> Thanks a lot, I will send email to WebKit community for more infomation.
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>
>>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20121123/56d8a340/attachment.html>