[webkit-reviews] review denied: [Bug 101473] Optimize RGBA4444ToRGBA8 packing/unpacking functions with NEON intrinsics in GraphicsContext3D : [Attachment 173024] patch_v2

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Nov 12 03:36:16 PST 2012


Zoltan Herczeg <zherczeg at webkit.org> has denied Gabor Rapcsanyi
<rgabor at webkit.org>'s request for review:
Bug 101473: Optimize RGBA4444ToRGBA8 packing/unpacking functions with NEON
intrinsics in GraphicsContext3D
https://bugs.webkit.org/show_bug.cgi?id=101473

Attachment 173024: patch_v2
https://bugs.webkit.org/attachment.cgi?id=173024&action=review

------- Additional Comments from Zoltan Herczeg <zherczeg at webkit.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=173024&action=review


> Source/WebCore/WebCore.pri:56
> +    $$SOURCE_DIR/platform/graphics/arm \

Since we have a gpu directory, I think a cpu/arm directory would be better. All
ARM specific optimizations could go here eventually (instead of creating
subdirectories, so the filter specific optimizations could be moved here
later).

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:44
> +	   uint8x8_t componentR = vqmovn_u16(vshrq_n_u16(eightPixels, 12));
> +	   uint8x8_t componentG = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels,
8), constant));
> +	   uint8x8_t componentB = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels,
4), constant));
> +	   uint8x8_t componentA = vqmovn_u16(vandq_u16(eightPixels, constant));


This takes 6 instructions. You can do it using only four, by deinterleaving the
input bytes into two uint8x8 arrays, and use one ">> 4" or one "& 0xf0" to
extract the components.

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:49
> +	   componentR = vorr_u8(vshl_n_u8(componentR, 4), componentR);
> +	   componentG = vorr_u8(vshl_n_u8(componentG, 4), componentG);
> +	   componentB = vorr_u8(vshl_n_u8(componentB, 4), componentB);
> +	   componentA = vorr_u8(vshl_n_u8(componentA, 4), componentA);

Hm even better idea:
componentR8 = component R4G4 << 4
componentG8 = component R4G4 & 0xf0
So you don't even nned to extract the components!
NEON is beautiful magic!

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:74
> +	   uint8x8x2_t tmp = vzip_u8(componentBA, componentRG);
> +	   uint8x16_t result = vcombine_u8(tmp.val[0], tmp.val[1]);
> +
> +	   vst1q_u16(destination, vreinterpretq_u16_u8(result));

You can simply use a deinterleaved write here.


More information about the webkit-reviews mailing list