[Webkit-unassigned] [Bug 73789] Need SSE optimization for SincResampler::Process()
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Dec 5 02:43:44 PST 2011
https://bugs.webkit.org/show_bug.cgi?id=73789
--- Comment #11 from xingnan.wang at intel.com 2011-12-05 02:43:44 PST ---
(In reply to comment #10)
> (In reply to comment #9)
> > > > Personally I prefer to have _mm_load_ps explicitly but that is a question of style I guess...
> > >
> > > I tried to use _mm_load_ps instead of _cast way and got some regression, from about 45% to 38%. I think the additional ops of function call resulted it.
> >
> > That is really strange. Intrinsics should not result in function calls, they are supposed to be always inlined. Are you sure you are not testing a debug build?
>
> Yes, you are right, there is no function call, so I did some investigation and found there were additional 2 movaps and 1 movl when using _mm_load_ps, while the reinterpret_cast is executed more directly.
>
> Here is some details of test:
> ===
> sse run time: 10 s 430768981 ns
> sse with mm_load run time: 11 s 386974860 ns
> std run time: 15 s 747296354 ns
> sse speed up = 1.509697
> sse with mm_load speed up = 1.382922
More update:
It should be the gcc optimization problem, if using -O2 I got the same result.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list