[webkit-reviews] review requested: [Bug 103614] Optimizing RGBA16, RGB16, ARGB16, BGRA16 unpacking functions with NEON intrinsics : [Attachment 180126] patch2
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Wed Dec 19 03:31:31 PST 2012
Gabor Rapcsanyi <rgabor at webkit.org> has asked for review:
Bug 103614: Optimizing RGBA16, RGB16, ARGB16, BGRA16 unpacking functions with
NEON intrinsics
https://bugs.webkit.org/show_bug.cgi?id=103614
Attachment 180126: patch2
https://bugs.webkit.org/attachment.cgi?id=180126&action=review
------- Additional Comments from Gabor Rapcsanyi <rgabor at webkit.org>
(In reply to comment #5)
> (From update of attachment 179710 [details])
> View in context:
https://bugs.webkit.org/attachment.cgi?id=179710&action=review
>
> > Source/WebCore/platform/graphics/cpu/arm/GraphicsContext3DNEON.h:46
> > + uint16x8_t eightComponents = vld1q_u16(source + i);
> > + eightComponents = vshrq_n_u16(eightComponents, 8);
> > + vst1_u8(destination + i, vqmovn_u16(eightComponents));
>
> I think this could be simplified to a simple read/write method without vshr.
Just read an interleaved low/high component data, and write back the high
component. Similar algorithm can be created to the other cases.
Yes thanks I changed it.
unpackOneRowOfRGBA16LittleToRGBA8: 3.19x faster now
I tried the same with unpackOneRowOfARGB16LittleToRGBA8:
uint8x16x2_t components = vld2q_u8(src + i * 2);
uint32x4_t ARGB = vreinterpretq_u32_u8(components.val[1]);
uint32x4_t RGBA = vorrq_u32(vshrq_n_u32(ARGB, 24), vshlq_n_u32(ARGB, 8));
vst1q_u8(destination + i, vreinterpretq_u8_u32(RGBA));
It was a little bit slower than my original solution.
More information about the webkit-reviews
mailing list