[webkit-reviews] review granted: [Bug 180020] Optimize FEMorphology : [Attachment 327591] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sun Nov 26 16:34:25 PST 2017


Sam Weinig <sam at webkit.org> has granted Simon Fraser (smfr)
<simon.fraser at apple.com>'s request for review:
Bug 180020: Optimize FEMorphology
https://bugs.webkit.org/show_bug.cgi?id=180020

Attachment 327591: Patch

https://bugs.webkit.org/attachment.cgi?id=327591&action=review




--- Comment #2 from Sam Weinig <sam at webkit.org> ---
Comment on attachment 327591
  --> https://bugs.webkit.org/attachment.cgi?id=327591
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=327591&action=review

> Source/WebCore/ChangeLog:16
> +	   This is about a 4x speedup before the parallel jobs tweaking. With
fixed parallelism,
> +	   a 200x200 filter went from 15ms to about 1ms with these changes.

Noice.

> Source/WebCore/platform/graphics/ColorUtilities.h:100
> +    static ColorComponents fromRGBA(unsigned pixel)
> +    {
> +	   return ColorComponents((pixel >> 24) & 0xFF, (pixel >> 16) & 0xFF,
(pixel >> 8) & 0xFF, pixel & 0xFF);
> +    }

In other places, we represent a pixel value like this with the RGBA class from
Color.h.  Would be good to be consistent if it makes sense.

> Source/WebCore/platform/graphics/ColorUtilities.h:113
> +    unsigned toRGBA() const
> +    {
> +	   return components[0] << 24 | components[1] << 16 | components[2] <<
8 | components[3];
> +    }

Same as above.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:98
> +ALWAYS_INLINE ColorComponents minOrMax(ColorComponents a, ColorComponents b)
>  {
> -    return (y * width + x) * 4 + colorChannel;
> +    if (type == FEMORPHOLOGY_OPERATOR_ERODE)
> +	   return perComponentMin(a, b);
> +
> +    return perComponentMax(a, b);

perComponentMin/perComponentMax both take const ColorComponents&, but this
takes it by value. Any idea which produces better code?

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:104
> +    ColorComponents extremum =
ColorComponents::fromRGBA(*reinterpret_cast<const
unsigned*>(srcPixelArray.data() + pixelArrayIndex(x, yStart, width)));

Could use auto here.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:107
> +	   ColorComponents pixel =
ColorComponents::fromRGBA(*reinterpret_cast<const
unsigned*>(srcPixelArray.data() + pixelArrayIndex(x, y, width)));

Auto?

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:126
> +    ColorComponents extremum = kernel[0];

auto?

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:144
> +    const Uint8ClampedArray& srcPixelArray = *paintingData.srcPixelArray;
> +    Uint8ClampedArray& dstPixelArray = *paintingData.dstPixelArray;

auto?

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:155
> +    ColumnExtrema extrema;
> +    extrema.reserveCapacity(2 * radiusX + 1);

You can use extrema.reserveInitialCapacity() here.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:161
> +	   extrema.resize(0);

This should be extrema.shrink(0);

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:173
> +	       if (x > radiusX)
> +		   extrema.remove(0);

I can't tell how common this is. But in many cases where we need to remove from
the beginning, a Deque can be be the better choice.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:188
> +    // Empirically, runtime is approximately linear over reasonable kernel
sizes with a slope of about 0.65.

Might want to document how you determined this empirical data.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:191
> +    static const int minimalArea = (160 * 160); // Empirical data limit for
parallel jobs

Same.

> Source/WebCore/platform/graphics/filters/FEMorphology.cpp:195
>	   ParallelJobs<PlatformApplyParameters>
parallelJobs(&WebCore::FEMorphology::platformApplyWorker, optimalThreadNumber);

No strictly related to this patch, but I checked what we were using for
ParallelJobs, saw that ParallelJobsLibdispatch (which I assume we are using)
uses dispatch_apply with a global queue. I believe using DISPATCH_APPLY_AUTO
for the queue will work better.


More information about the webkit-reviews mailing list