[webkit-dev] We're doing JPEG parallel decoding, wan't to contribute code to Webkit

Zhang Peixuan zhangpeixuan.cn at gmail.com
Thu Nov 22 23:44:43 PST 2012

Thanks a lot, and we will check it soon.

2012/11/23 Dong Seong Hwang <luxtella at company100.net>

> Looks promising to me. However, as you mentioned, using gpu wastes
> constant times. You have to handle trade-off carefully, because decoding
> small jpeg is light.
> You can work in parallel with Bug 90375 as Niwa mentioned. Bug 90375 tried
> to change WebKit architecture to decode off the main thread. Your work
> optimizes just decoding operations. Both approaches can get along with.
> In addition, Bug 90375 was postponed, because this causes flashing images.
> We think the solution will be general deferred image decoding on WebKit.
> Alpha in chromium team works deferred image decoding on chromium in
> progress.
> https://bugs.webkit.org/show_bug.cgi?id=94240
> You need to see. I think Alpha will decode an image off the main thread.
> You should make the opencl jpeg decoder can be used on multi threads.
> - Dongsung Huang
> 2012/11/23 Ryosuke Niwa <rniwa at webkit.org>
>> See the following two threads:
>> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021820.html
>> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021734.html
>> In particular, https://bugs.webkit.org/show_bug.cgi?id=90375 appears to
>> have a work-in-progress patch.
>> - R. Niwa
>> On Thu, Nov 22, 2012 at 6:31 PM, Zhang Peixuan <zhangpeixuan.cn at gmail.com
>> > wrote:
>>> Hello,
>>>      We are writing a GPU-based libjpeg-turbo accelerated version and
>>> our goal is to use it in Chrome or other browsers that using Webkit. Now we
>>> have written a beta version, that can use GPU to decode JPEG files in
>>> Chromium.
>>>      We still have a lot of work to do. And I have known from the
>>> Chromium community that there is also an effort underway in WebKit to
>>> generalize the concept of parallel/asynchronous image decoding.
>>>      So I wan't to know whether we could contribute code?
>>>      Thanks a lot.
>>>              Peixuan Zhang
>>>                 20121123
>>> The following is the email that I have sent to Chromium community.
>>> ===================================
>>> Hello,
>>> I'm a programmer, and my team and I are writing a GPU-based
>>> libjpeg-turbo accelerated version, and we mainly use OpenCL. Our goal is to
>>> use it in Chrome. Now we have written a beta version, that can use GPU to
>>> decode JPEG files in Chromium.
>>>  However, because we need to load the some additional .dll files and
>>> API (e.g. we must load OpenCL.dll), this version must run with the
>>> parameter --no-sandbox.
>>> We don't know how to run it without no-sandbox, so I really want to know
>>> how to load additional .dll files and access some information of the
>>> registry in sandbox. Is there some way to do it?
>>>  In addition, because we need to do some initialization before using
>>> OpenCL, while Chrome is a multi-process application, so it needs to do
>>> initialization work in each process, which increases the time consumption.
>>> We have put forward several ideas, and my workmate Peng Xiao has discussed
>>> with you in this community. But after some discussion, we thought that
>>> these ideas may not be suitable.
>>> Therefore, we have proposed another solution, if we use a separate
>>> process to deal with jpeg decoder, We won't need to do multiple
>>> initialization work. I think it just like the process of
>>> "--type=gpu-process". We could decode image using a single process.
>>>  We learned that Chrome run JPEG decoder in sandbox maybe because
>>> safety factors, so we don't know if we run all JPEG decoder in one process,
>>> whether it will bring security risks? Or whether it will bring other
>>> problems? Because this step of the work is still in the conceptual stage,
>>> we do not know whether it is worthwhile to go ahead.
>>>  Yours sincerely.
>>> =============================================================
>>> 1. Do you have timing information about how jpeg decoding is a
>>> bottleneck at the moment? How much % of time is spent in jpeg decoding on
>>> rendering?
>>> According to the libjpeg-turbo-OpenCL that we have already completed,
>>> the performance is a little good than the original version. Of course, we
>>> only tested independent libjpeg-turbo, and there may be some differences in
>>> Chrome.
>>> We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version
>>> is 20~70% faster than before (the performance due to image size and
>>> sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M
>>> 2.9GHz.
>>> Of course, in many cases the JPEG codec is not the most time-consuming
>>> things in browsers, but with the popularity of HTML5, the picture codec's
>>> proportion will be more and more. e.g. There are many JPEG textures in
>>> WebGL.
>>>  2. Do you plan to use OpenCL for other things than jpeg decoding?
>>> Yes, we do have more plan that use OpenCL to accelerate some of the
>>> features in Chrome, what we're doing at least include JPEG and FFMpeg, in
>>> the future we may do more work on image and video.
>>>  3. Do you have an idea about the latency introduced by doing that,
>>> plus the kernel overhead, compared to a completely user-mode solution?
>>> There are several context switches introduced that would add a constant
>>> time to decoding an image, which severely affect smaller images. Is it
>>> worth sending 500 bytes of data to the GPU to be decoded? I don't think so.
>>> Yes, we have some ideas that can reduce the transmission time between
>>> CPU and GPU, and we also try to reduce the time of kernel overhead, some of
>>> these ideas have matured, but we are waiting for its open source.
>>>  4. The sandbox bypass is a non-starter. Adding yet-another-process is
>>> a non-starter too. Having a new jpeg decoder significantly increases the
>>> attack surface so just from a security perspective, I'm not sure it's worth.
>>> It's very important, if it would bring high risk of safe, the value of
>>> the work is low.
>>>  5. Do you have an idea how to do the runtime trade off when it's worth
>>> doing a software-only decoding versus offloading to the GPU? What if the
>>> user has its GPU already saturated but its CPU idle? At the extreme end,
>>> let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic
>>> cards with 2 30" monitors plugged in.
>>> OK, We are concerned about the different things, I think on AMD trinity
>>> APU, there may not be such problems, for Intel, I think I need to do some
>>> additional research.
>>>  In addition to what Marc-Antoine said, note that there is also an
>>> effort underway in WebKit to generalize the concept of
>>> parallel/asynchronous image decoding. You probably want to sync up with
>>> that effort to see what overlaps.
>>> Thanks a lot, I will send email to WebKit community for more infomation.
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20121123/31efc78d/attachment.html>

More information about the webkit-dev mailing list