[webkit-dev] Accelerated 2D Tesselation Implementation

Fri Aug 27 17:32:59 PDT 2010

On Fri, Aug 27, 2010 at 5:10 PM, Chris Marrin <cmarrin at apple.com> wrote:
>
> On Aug 27, 2010, at 4:21 PM, Kenneth Russell wrote:
>
>> On Fri, Aug 27, 2010 at 10:18 AM, Chris Marrin <cmarrin at apple.com> wrote:
>>>
>>> Hi Ken,
>>> It would help me, and I think many others, if we could have a discussion of
>>> how exactly your tessellation logic works and what it is intended to do. For
>>> instance, is the algorithm you're using based on Loop-Blinn? I'm a bit
>>> familiar with that algorithm and some of the problems it has with rendering
>>> this type of geometry. For instance, there are typically anti-aliasing
>>> artifacts at the points where the interior triangles touch the edges. These
>>> are described in section 5.1 of the paper and the authors added additional
>>> (and not well described) logic to solve the problem.
>>> If you could describe your algorithm a bit and show some expected results
>>> with typical cases, that would be really helpful.
>>> For those not familiar with Loop-Blinn, here is a link to their original
>>> paper, presented at Siggraph 2005:
>>> Resolution Independent Curve Rendering using Programmable Graphics ...
>>> It's a great algorithm for rendering resolution independent 2D objects using
>>> the GPU. It has potential to render both 2D shapes (as used in Canvas and
>>> SVG) and text glyphs. It's advantage is that once you generate the triangles
>>> for the shape, you can render the shape at any resolution. It's disadvantage
>>> is that the triangle generation is quite expensive, so mutating shapes can
>>> potentially be slower than a simpler resolution dependent tessellation.
>>
>> The code that is out for review (see
>> https://bugs.webkit.org/show_bug.cgi?id=44729 ) is, as described in
>> the bug report, precisely an implementation of Loop and Blinn's
>> algorithm. It comes from their simplified reformulation in Chapter 25
>> of GPU Gems 3, which can be found online at
>> http://http.developer.nvidia.com/GPUGems3/gpugems3_ch25.html .
>
> Ok, great, thanks. That makes everything much more clear.
>
>>
>> Their GPU Gems chapter discusses the antialiasing artifacts you
>> mention, where interior triangles not handled by their shader touch
>> the surface of the shape. They point out that enabling multisampling
>> antialiasing solves this problem; the contents of these pixels are
>> antialiased via MSAA, and the pixels covered by their shader are
>> handled by their antialiasing formula. My own experience has been that
>> this works very well; there are no distinguishable bumps or aliasing
>> artifacts with both MSAA enabled and using the antialiasing version of
>> their shader.
>
> Yeah, this is what I thought might be the solution. The problem with multisampling is that it's not hardware accelerated everywhere and where it is, it increases expense in both storage and rendering time. I'm not saying this is a showstopper, it's just an issue.
>
> So here's my concern. If you have to multisample for anti-aliasing anyway, why not just tesselate the shape at an appropriate sampling resolution and multisample the resulting triangle rendering. That would have the disadvantage of not being resolution independent, but it would be much simpler. And you can control the re-tessellation by oversampling the curve and only re-tesselating every few frames (maybe in the background) or only re-tesselating when not animating, or something like that. Would it be significantly faster or slower? Would the quality be better or worse? We'd have to run some tests. I just don't want to jump into a big expensive algorithm when something simpler would suffice.

I don't have experience comparing to an algorithm which creates this
sort of high density mesh. However, I do have experience comparing a
full implementation of Loop and Blinn's algorithm to one which only
supported the (simpler) quadratic case. This required cubics to be
approximated with quadratics, and the resulting meshes contained many
more triangles than when cubics were handled directly. For complex
scenes involving many paths, the reduction in triangle count from
supporting cubics improved performance significantly. I imagine the
performance improvements of having a resolution-independent mesh will
be even more pronounced compared to approximating curves with line
segments. The tessellation cost will also be much higher in the latter
case because of the dramatic increase in the number of vertices to
consider. If using an algorithm like Kokojima's to avoid tessellation,
the number of triangles drawn will skyrocket.

>> This is actually the second implementation of this algorithm I have
>> done. In the first, we did much more experimentation, including trying
>> Kokojima et al's stencil buffer algorithm rather than tessellating the
>> interior of the shape. We found that the increased fill rate
>> requirements of Kokojima's technique was detrimental to performance,
>> and that it was better to tessellate to reduce overdraw.
>
> Yeah, I can see that algorithm have some really horrible edge cases.
>
>>
>> In my first implementation of this algorithm, we found that detecting
>> and resolving overlaps of cubic curve control points (section 25.4 in
>> the GPU Gems 3 chapter) was by far the most expensive part of the
>> algorithm. It is also absolutely necessary in order to handle
>> arbitrary inputs. After some investigation I realized that the
>> plane-sweep algorithm mapped well to the problem of detecting overlaps
>> of control point triangles, and incorporating a variant of this
>> algorithm reduced the curve processing time dramatically. This
>> optimization is included in the code out for review.
>>
>> Currently we have hooked up this code to a 2D Canvas implementation,
>> and are performing no caching; we tessellate each path as it comes in.
>> Performance of the classic SVG butterfly drawn using
>> CanvasRenderingContext2D is similar to that of the software renderer,
>> although I do not have concrete performance numbers yet. For retained
>> mode APIs like SVG where the processing results can be cached easily,
>> I expect significant performance improvements based on previous
>> experience.
>
> I look forward to seeing some live demos and some performance numbers.
>
> I really think the best way to proceed is to do the work in a branch. I really want to see the results. But I don't want to commit WebKit to all this code until we know this is the right way forward.

I definitely do not want to commit the WebKit project to using this
particular algorithm and none other. I suspect we will need to
investigate multiple approaches for GPU accelerating path rendering.

That having been said, there is ongoing work to accelerate 2D canvas
rendering using the GPU, and it has been determined that path
rendering is a major bottleneck on some benchmarks. We believe that
this algorithm will help eliminate this bottleneck. In order to
continue our GPU accelerated canvas work, it is essential that the
work continue on the WebKit trunk, where it is currently ongoing. We
can not maintain a fixed branch of both the Chromium and WebKit
projects, especially as other GPU infrastructure work is actively
ongoing in both.

I am committed to making this code work well in the WebKit
infrastructure, and am not wedded to the particular algorithm; if a
better one comes along, let's switch to it.

Please work with me to find a way forward that is mutually acceptable.

-Ken