[Webkit-unassigned] [Bug 101473] Optimize RGBA4444ToRGBA8 packing/unpacking functions with NEON intrinsics in GraphicsContext3D

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Nov 12 03:36:19 PST 2012


https://bugs.webkit.org/show_bug.cgi?id=101473


Zoltan Herczeg <zherczeg at webkit.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #173024|review?, commit-queue?      |review-
               Flag|                            |




--- Comment #6 from Zoltan Herczeg <zherczeg at webkit.org>  2012-11-12 03:38:00 PST ---
(From update of attachment 173024)
View in context: https://bugs.webkit.org/attachment.cgi?id=173024&action=review

> Source/WebCore/WebCore.pri:56
> +    $$SOURCE_DIR/platform/graphics/arm \

Since we have a gpu directory, I think a cpu/arm directory would be better. All ARM specific optimizations could go here eventually (instead of creating subdirectories, so the filter specific optimizations could be moved here later).

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:44
> +        uint8x8_t componentR = vqmovn_u16(vshrq_n_u16(eightPixels, 12));
> +        uint8x8_t componentG = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels, 8), constant));
> +        uint8x8_t componentB = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels, 4), constant));
> +        uint8x8_t componentA = vqmovn_u16(vandq_u16(eightPixels, constant));

This takes 6 instructions. You can do it using only four, by deinterleaving the input bytes into two uint8x8 arrays, and use one ">> 4" or one "& 0xf0" to extract the components.

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:49
> +        componentR = vorr_u8(vshl_n_u8(componentR, 4), componentR);
> +        componentG = vorr_u8(vshl_n_u8(componentG, 4), componentG);
> +        componentB = vorr_u8(vshl_n_u8(componentB, 4), componentB);
> +        componentA = vorr_u8(vshl_n_u8(componentA, 4), componentA);

Hm even better idea:
componentR8 = component R4G4 << 4
componentG8 = component R4G4 & 0xf0
So you don't even nned to extract the components!
NEON is beautiful magic!

> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:74
> +        uint8x8x2_t tmp = vzip_u8(componentBA, componentRG);
> +        uint8x16_t result = vcombine_u8(tmp.val[0], tmp.val[1]);
> +
> +        vst1q_u16(destination, vreinterpretq_u16_u8(result));

You can simply use a deinterleaved write here.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list