[Webkit-unassigned] [Bug 103614] Optimizing RGBA16, RGB16, ARGB16, BGRA16 unpacking functions with NEON intrinsics
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Wed Dec 19 03:31:31 PST 2012
https://bugs.webkit.org/show_bug.cgi?id=103614
Gabor Rapcsanyi <rgabor at webkit.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #179710|0 |1
is obsolete| |
Attachment #180126| |review?, commit-queue?
Flag| |
--- Comment #6 from Gabor Rapcsanyi <rgabor at webkit.org> 2012-12-19 03:33:47 PST ---
Created an attachment (id=180126)
--> (https://bugs.webkit.org/attachment.cgi?id=180126&action=review)
patch2
(In reply to comment #5)
> (From update of attachment 179710 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=179710&action=review
>
> > Source/WebCore/platform/graphics/cpu/arm/GraphicsContext3DNEON.h:46
> > + uint16x8_t eightComponents = vld1q_u16(source + i);
> > + eightComponents = vshrq_n_u16(eightComponents, 8);
> > + vst1_u8(destination + i, vqmovn_u16(eightComponents));
>
> I think this could be simplified to a simple read/write method without vshr. Just read an interleaved low/high component data, and write back the high component. Similar algorithm can be created to the other cases.
Yes thanks I changed it.
unpackOneRowOfRGBA16LittleToRGBA8: 3.19x faster now
I tried the same with unpackOneRowOfARGB16LittleToRGBA8:
uint8x16x2_t components = vld2q_u8(src + i * 2);
uint32x4_t ARGB = vreinterpretq_u32_u8(components.val[1]);
uint32x4_t RGBA = vorrq_u32(vshrq_n_u32(ARGB, 24), vshlq_n_u32(ARGB, 8));
vst1q_u8(destination + i, vreinterpretq_u8_u32(RGBA));
It was a little bit slower than my original solution.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list