[Webkit-unassigned] [Bug 75528] Optimize the multiply-add in Biquad.cpp::process

Tue Mar 20 20:36:18 PDT 2012

https://bugs.webkit.org/show_bug.cgi?id=75528

--- Comment #48 from xingnan.wang at intel.com  2012-03-20 20:36:18 PST ---
(From update of attachment 132760)
Thanks, Ray. Update the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=132760&action=review

>> Source/WebCore/platform/audio/Biquad.cpp:126
>> +        mm7 = _mm_load_ss(sourceP); // Load next x value, mm7 = (0, 0, 0, x)
> 
> We load mm7 from memory here, but use it in the very next instruction.  Could this be moved further up to lessen any pipeline issues?  We're not using the value of mm7 until line 127.

There is little difference of performance after moving this line up, I think it is not in the key path of the pipeline.

>> Source/WebCore/platform/audio/Biquad.cpp:130
>> +        mm5 = _mm_add_sd(mm5, mm6); // mm5 = (x1 * b1 - y2 * a2, x * b0 + x1 * b1 + x2 * b2 - y1 * a1 - y2 * a2) = (*, y)
> 
> Same comment here as for line 128.  We don't care what's in the high part of mm5.
> 
> Also, in line 129, we swap the high and low parts of mm6.  Isn't it possible to do that in line 125 where we assign mm5 to mm6?  Just shuffle mm5 to the right parts of mm6?  That's one less instruction, but I don't know what the pipeline affects will be.

It seems we should do that and make us to save one instruction, but it pulls down the performance after that, strange~ 
I guess the compiler does much for us when we use intrinsic, the actual assembly code looks much different from what I expect for after  "gcc -s " .

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.