[Webkit-unassigned] [Bug 101473] Optimize RGBA4444ToRGBA8 packing/unpacking functions with NEON intrinsics in GraphicsContext3D
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Nov 12 03:36:19 PST 2012
https://bugs.webkit.org/show_bug.cgi?id=101473
Zoltan Herczeg <zherczeg at webkit.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #173024|review?, commit-queue? |review-
Flag| |
--- Comment #6 from Zoltan Herczeg <zherczeg at webkit.org> 2012-11-12 03:38:00 PST ---
(From update of attachment 173024)
View in context: https://bugs.webkit.org/attachment.cgi?id=173024&action=review
> Source/WebCore/WebCore.pri:56
> + $$SOURCE_DIR/platform/graphics/arm \
Since we have a gpu directory, I think a cpu/arm directory would be better. All ARM specific optimizations could go here eventually (instead of creating subdirectories, so the filter specific optimizations could be moved here later).
> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:44
> + uint8x8_t componentR = vqmovn_u16(vshrq_n_u16(eightPixels, 12));
> + uint8x8_t componentG = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels, 8), constant));
> + uint8x8_t componentB = vqmovn_u16(vandq_u16(vshrq_n_u16(eightPixels, 4), constant));
> + uint8x8_t componentA = vqmovn_u16(vandq_u16(eightPixels, constant));
This takes 6 instructions. You can do it using only four, by deinterleaving the input bytes into two uint8x8 arrays, and use one ">> 4" or one "& 0xf0" to extract the components.
> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:49
> + componentR = vorr_u8(vshl_n_u8(componentR, 4), componentR);
> + componentG = vorr_u8(vshl_n_u8(componentG, 4), componentG);
> + componentB = vorr_u8(vshl_n_u8(componentB, 4), componentB);
> + componentA = vorr_u8(vshl_n_u8(componentA, 4), componentA);
Hm even better idea:
componentR8 = component R4G4 << 4
componentG8 = component R4G4 & 0xf0
So you don't even nned to extract the components!
NEON is beautiful magic!
> Source/WebCore/platform/graphics/arm/GraphicsContext3DNEON.h:74
> + uint8x8x2_t tmp = vzip_u8(componentBA, componentRG);
> + uint8x16_t result = vcombine_u8(tmp.val[0], tmp.val[1]);
> +
> + vst1q_u16(destination, vreinterpretq_u16_u8(result));
You can simply use a deinterleaved write here.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list