[Webkit-unassigned] [Bug 73789] Need SSE optimization for SincResampler::Process()

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Dec 5 02:31:56 PST 2011


--- Comment #10 from xingnan.wang at intel.com  2011-12-05 02:31:56 PST ---
(In reply to comment #9)
> > > Personally I prefer to have _mm_load_ps explicitly but that is a question of style I guess...
> > 
> > I tried to use _mm_load_ps instead of _cast way and got some regression, from about 45% to 38%. I think the additional ops of function call resulted it.
> That is really strange. Intrinsics should not result in function calls, they are supposed to be always inlined. Are you sure you are not testing a debug build?

Yes, you are right, there is no function call, so I did some investigation and found there were additional 2 movaps and 1 movl when using _mm_load_ps, while the reinterpret_cast is executed more directly.

Here is some details of test:
sse              run time: 10 s 430768981 ns
sse with mm_load run time: 11 s 386974860 ns
std              run time: 15 s 747296354 ns
sse              speed up = 1.509697
sse with mm_load speed up = 1.382922

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list