[Webkit-unassigned] [Bug 75528] Optimize the multiply-add in Biquad.cpp::process
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Wed Jan 11 09:55:01 PST 2012
https://bugs.webkit.org/show_bug.cgi?id=75528
--- Comment #13 from Raymond Toy <rtoy at chromium.org> 2012-01-11 09:54:53 PST ---
(From update of attachment 121082)
View in context: https://bugs.webkit.org/attachment.cgi?id=121082&action=review
>>> Source/WebCore/platform/audio/Biquad.cpp:93
>>> + __asm__(
>>
>> I am far from an expert in sse2, but this seems rather complex. Could we do something like this?
>>
>> Create array yy of length 2 (or maybe use the destination array directly?):
>> yy[0] = y
>> yy[1] = y1
>>
>> load xmm0 with (b0 b1 b2 0)
>> load xmm1 with (-a0 -a1 0 0)
>> load xmm2 from *source to get (x[0] x[1] x[2] x[3])
>> xmm2 = xmm2 * xmm0 to get (b0*x0 b1*x1 b2*x2 0)
>> load xmm3 from yy to get (y0 y1 junk junk)
>> xmm3 = xmm3 * xmm1 to get (-a0*y0 a1*y1 0 0)
>> xmm3 = xmm3 + xmm2 to get (b0*x0-a0*y0, b1*x1-a1*y1, b2*x2, 0)
>>
>> Extract each part of xmm3 and add them together. (We could gain something here if we had SSE3 to do the add, I think.)
>>
>> yy[0] = yy[1]
>> yy[1] = result of sum.
>>
>> Don't know if this is faster or slower. This will change results slightly because we do everything in single precision.
>
> Your solution may work in single precision and I didn`t try.
> I have tried this way in double precision but no too much improvement. Then I changed to use inline asm so that I can pipeline the instructions, also get the benefit from SSE2.
This looks fine then especially considering Chris's comment about keeping things in double for stability reasons.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list