[Webkit-unassigned] [Bug 73789] Need SSE optimization for SincResampler::Process()

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Dec 5 02:43:44 PST 2011


https://bugs.webkit.org/show_bug.cgi?id=73789





--- Comment #11 from xingnan.wang at intel.com  2011-12-05 02:43:44 PST ---
(In reply to comment #10)
> (In reply to comment #9)
> > > > Personally I prefer to have _mm_load_ps explicitly but that is a question of style I guess...
> > > 
> > > I tried to use _mm_load_ps instead of _cast way and got some regression, from about 45% to 38%. I think the additional ops of function call resulted it.
> > 
> > That is really strange. Intrinsics should not result in function calls, they are supposed to be always inlined. Are you sure you are not testing a debug build?
> 
> Yes, you are right, there is no function call, so I did some investigation and found there were additional 2 movaps and 1 movl when using _mm_load_ps, while the reinterpret_cast is executed more directly.
> 
> Here is some details of test:
>  === 
> sse              run time: 10 s 430768981 ns
> sse with mm_load run time: 11 s 386974860 ns
> std              run time: 15 s 747296354 ns
> sse              speed up = 1.509697
> sse with mm_load speed up = 1.382922

More update:

It should be the gcc optimization problem, if using -O2 I got the same result.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list