[Webkit-unassigned] [Bug 75528] Optimize the multiply-add in Biquad.cpp::process

Wed Jan 11 09:55:01 PST 2012

https://bugs.webkit.org/show_bug.cgi?id=75528

--- Comment #13 from Raymond Toy <rtoy at chromium.org>  2012-01-11 09:54:53 PST ---
(From update of attachment 121082)
View in context: https://bugs.webkit.org/attachment.cgi?id=121082&action=review

>>> Source/WebCore/platform/audio/Biquad.cpp:93
>>> +    __asm__(
>> 
>> I am far from an expert in sse2, but this seems rather complex.  Could we do something like this?
>> 
>> Create array yy of length 2 (or maybe use the destination array directly?):
>> yy[0] = y
>> yy[1] = y1
>> 
>> load xmm0 with (b0 b1 b2 0)
>> load xmm1 with (-a0 -a1 0 0)
>> load xmm2 from *source to get (x[0] x[1] x[2] x[3])
>> xmm2 = xmm2 * xmm0 to get (b0*x0 b1*x1 b2*x2 0)
>> load xmm3 from yy to get (y0 y1 junk junk)
>> xmm3 = xmm3 * xmm1 to get (-a0*y0 a1*y1 0 0)
>> xmm3 = xmm3 + xmm2 to get (b0*x0-a0*y0, b1*x1-a1*y1, b2*x2, 0)
>> 
>> Extract each part of xmm3 and add them together.  (We could gain something here if we had SSE3 to do the add, I think.)
>> 
>> yy[0] = yy[1]
>> yy[1] = result of sum.
>> 
>> Don't know if this is faster or slower.  This will change results slightly because we do everything in single precision.
> 
> Your solution may work in single precision and I didn`t try.
> I have tried this way in double precision but no too much improvement. Then I changed to use inline asm so that I can pipeline the instructions, also get the benefit from SSE2.

This looks fine then especially considering Chris's comment about keeping things in double for stability reasons.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.