[Webkit-unassigned] [Bug 75528] Optimize the multiply-add in Biquad.cpp::process

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Jan 5 11:07:10 PST 2012


--- Comment #8 from Raymond Toy <rtoy at chromium.org>  2012-01-05 11:07:10 PST ---
(From update of attachment 121082)
View in context: https://bugs.webkit.org/attachment.cgi?id=121082&action=review

> Source/WebCore/platform/audio/Biquad.cpp:90
> +#ifdef __SSE2__

Did you test this on windows?  I think the asm syntax below only works for gcc, so this conditionalization should test for gcc too.  (But I guess on windows, __SSE2__ is probably never defined.)

> Source/WebCore/platform/audio/Biquad.cpp:93
> +    __asm__(

I am far from an expert in sse2, but this seems rather complex.  Could we do something like this?

Create array yy of length 2 (or maybe use the destination array directly?):
yy[0] = y
yy[1] = y1

load xmm0 with (b0 b1 b2 0)
load xmm1 with (-a0 -a1 0 0)
load xmm2 from *source to get (x[0] x[1] x[2] x[3])
xmm2 = xmm2 * xmm0 to get (b0*x0 b1*x1 b2*x2 0)
load xmm3 from yy to get (y0 y1 junk junk)
xmm3 = xmm3 * xmm1 to get (-a0*y0 a1*y1 0 0)
xmm3 = xmm3 + xmm2 to get (b0*x0-a0*y0, b1*x1-a1*y1, b2*x2, 0)

Extract each part of xmm3 and add them together.  (We could gain something here if we had SSE3 to do the add, I think.)

yy[0] = yy[1]
yy[1] = result of sum.

Don't know if this is faster or slower.  This will change results slightly because we do everything in single precision.

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list