[Webkit-unassigned] [Bug 103614] Optimizing RGBA16, RGB16, ARGB16, BGRA16 unpacking functions with NEON intrinsics

Wed Dec 19 03:31:31 PST 2012

https://bugs.webkit.org/show_bug.cgi?id=103614

Gabor Rapcsanyi <rgabor at webkit.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #179710|0                           |1
        is obsolete|                            |
 Attachment #180126|                            |review?, commit-queue?
               Flag|                            |

--- Comment #6 from Gabor Rapcsanyi <rgabor at webkit.org>  2012-12-19 03:33:47 PST ---
Created an attachment (id=180126)
 --> (https://bugs.webkit.org/attachment.cgi?id=180126&action=review)
patch2

(In reply to comment #5)
> (From update of attachment 179710 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=179710&action=review
> 
> > Source/WebCore/platform/graphics/cpu/arm/GraphicsContext3DNEON.h:46
> > +        uint16x8_t eightComponents = vld1q_u16(source + i);
> > +        eightComponents = vshrq_n_u16(eightComponents, 8);
> > +        vst1_u8(destination + i, vqmovn_u16(eightComponents));
> 
> I think this could be simplified to a simple read/write method without vshr. Just read an interleaved low/high component data, and write back the high component. Similar algorithm can be created to the other cases.

Yes thanks I changed it.
unpackOneRowOfRGBA16LittleToRGBA8: 3.19x faster now

I tried the same with unpackOneRowOfARGB16LittleToRGBA8:
  uint8x16x2_t components = vld2q_u8(src + i * 2);
  uint32x4_t ARGB = vreinterpretq_u32_u8(components.val[1]);
  uint32x4_t RGBA = vorrq_u32(vshrq_n_u32(ARGB, 24), vshlq_n_u32(ARGB, 8));
  vst1q_u8(destination + i, vreinterpretq_u8_u32(RGBA));

It was a little bit slower than my original solution.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.