<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">cc’ing webkit dev because there may be other folks who might be interested in this information, and/or may also be able to help.<div><br></div><div><div><div>On Apr 1, 2014, at 6:57 AM, Tomas Popela <<a href="mailto:tpopela@redhat.com">tpopela@redhat.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta http-equiv="Content-Type" content="text/html; CHARSET=UTF-8">
<meta name="GENERATOR" content="GtkHTML/4.6.6">
<div>
Hi Mark,<br>
if you don't mind I have some questions for you regarding to <a href="https://bugs.webkit.org/show_bug.cgi?id=128743">https://bugs.webkit.org/show_bug.cgi?id=128743</a> - CLoop crashes on PPC64, S390X as I'm lost in it.. While looking on it I came to <a href="https://bugs.webkit.org/show_bug.cgi?id=97586#c10">https://bugs.webkit.org/show_bug.cgi?id=97586#c10</a> where you suggest to modify llint/LowLevelInterpreter.cpp and add there some debug prints. I tried it and I saw that t0.i32 != t0.i.. What that could mean?<br>
<br>
&t0.i32 = 0x3fffd709693c<br>
&t0.i = 0x3fffd7096938<br>
Setting t0.i = 0xfedcba9876543211;<br>
raw t0 = 0xfedcba9876543211<br>
Setting t0.i32 = 0x00000000;<br>
raw t0 = 0xfedcba9800000000<br>
<br>
Also another thing. I managed to compile trunk on that PPC64 machine. When I open jsc and just type "1" there it crashes as well. I have modified the *.asm files to print names of macros and labels to see the flow with:<br>
<br>
<pre>$ sed 's@^\(\..*\):$@&\n cloopDo // printf ("\1\\n");@' LowLevelInterpreter.asm
$ sed 's@^macro \(.*\).*$@&\n cloopDo // printf ("\1\\n");@' LowLevelInterpreter.asm
</pre>
</div></blockquote><div><div>FYI, there’s already some built-in tracing functionality for the C loop LLINT. In LowLevelInterpreter.cpp, search for TRACE_OPCODE. You can define ENABLE_TRACE_OPCODE 1 to enable that code. Similar to the tracing that you’ve added, it will tell which opcodes are being executed but only at the opcode granularity.</div><div><br></div></div><br><blockquote type="cite"><div>and the same for LowLevelInterpreter64.asm as well. The flow is different between ppc64 and x86_64:<br>
<br>
<pre>[<a href="mailto:tpopela@ibm-power7r2-01">tpopela@ibm-power7r2-01</a> bin]$ ./jsc
>>> 1
&t0.i32 = 0x3ffff1982c2c
&t0.i = 0x3ffff1982c28
Setting t0.i = 0xfedcba9876543211;
raw t0 = 0xfedcba9876543211
Setting t0.i32 = 0x00000000;
raw t0 = 0xfedcba9800000000
doCallToJavaScript(makeCall)
callToJavaScriptPrologue()
checkStackPointerAlignment(tempReg, location)
.stackHeightOK
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.copyArgs
.copyArgsLoop
Segmentation fault
</pre>
<br>
When I do this on my x86_64 machine the processflow is different.<br>
<br>
<pre> tpopela ~/dev/WebKit/WebKitBuild/Debug/bin > 14:24 [master]
$ ./jsc
>>> 1
doCallToJavaScript(makeCall)
callToJavaScriptPrologue()
checkStackPointerAlignment(tempReg, location)
.stackHeightOK
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.copyHeaderLoop
.fillExtraArgsLoop <-------------------------
.copyArgs
.copyArgsLoop
.copyArgsDone
checkStackPointerAlignment(tempReg, location)
makeJavaScriptCall(entry, temp)
prologue(codeBlockGetter, codeBlockSetter, osrSlowPath, traceSlowPath)
preserveCallerPCAndCFR()
notFunctionCodeBlockGetter(targetRegister)
notFunctionCodeBlockSetter(sourceRegister)
moveStackPointerForCodeBlock(codeBlock, scratch)
dispatch(advance)
jumpToInstruction()
traceExecution()
checkStackPointerAlignment(tempReg, location)
.opEnterDone
callSlowPath(slowPath)
prepareStateForCCall()
cCall2(function, arg1, arg2)
checkStackPointerAlignment(tempReg, location)
restoreStateAfterCCall()
dispatch(advance)
jumpToInstruction()
traceExecution()
loadConstantOrVariable(index, value)
.constant
.done
dispatch(advance)
jumpToInstruction()
traceExecution()
loadConstantOrVariable(index, value)
.constant
.done
dispatch(advance)
jumpToInstruction()
traceExecution()
checkSwitchToJITForEpilogue()
checkSwitchToJIT(increment, action)
assertNotConstant(index)
assert(assertion)
doReturn()
restoreCallerPCAndCFR()
checkStackPointerAlignment(tempReg, location)
.calleeFramePopped
checkStackPointerAlignment(tempReg, location)
callToJavaScriptEpilogue()
1
</pre>
</div></blockquote><div><br></div><div>Looking at .fillExtraArgsLoop in LowLevelInterpreter64.asm, I see that it is part of the loop that starts at .copyHeaderLoop.</div><div>I suggest you add some “cloopDo // printf(“…”);” tracing in the instructions there and inspect the CLoopRegisters to see what values they contain to see why the difference exists. </div><div><br></div><div>You can also add “cloopDo // myTestFunction();” and set a gdb breakpoint in myTestFunction(). Thereafter, step thru the code to see how it works and compare between x86_64 and ppc64. </div><div><br></div><br><blockquote type="cite"><div>
When it crashes on ppc64 it is crashing here:<br>
<br>
<pre>106         pcBase.i64 = *CAST<int64_t*>(t3.i8p + (pc.i << 3)); // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:225
(gdb) p t3
$1 = {{i = 0, u = 0, {i32padding = 0, i32 = 0}, {u32padding = 0, u32 = 0}, {i8padding = "\000\000\000\000\000\000", i8 = 0 '\000'}, {u8padding = "\000\000\000\000\000\000", u8 = 0 '\000'}, ip = 0x0, i8p = 0x0, vp = 0x0, callFrame = 0x0, execState = 0x0, instruction = 0x0, vm = 0x0, cell = 0x0, protoCallFrame = 0x0, nativeFunc = 0x0, i64 = 0, u64 = 0, encodedJSValue = 0, castToDouble = 0, opcode = 0x0}}
</pre>
As you see the t3 register is completely empty thus is crashes. The problem is that it went to copy the args as<br>
<br>
<pre> if (pc.i32 == 0) // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:223
goto _offlineasm_doCallToJavaScript__copyArgsDone;
</pre>
ppc64:
<pre> (gdb) p pc
$1 = {{i = 4294967295, u = 4294967295, {i32padding = 0, <b>i32 = -1</b>}, {u32padding = 0, u32 = 4294967295}, {i8padding = "\000\000\000\000\377\377\377",
i8 = -1 '\377'}, {u8padding = "\000\000\000\000\377\377\377", u8 = 255 '\377'}, ip = 0xffffffff, i8p = 0xffffffff <Address 0xffffffff out of bounds>,
vp = 0xffffffff, callFrame = 0xffffffff, execState = 0xffffffff, instruction = 0xffffffff, vm = 0xffffffff, cell = 0xffffffff,
protoCallFrame = 0xffffffff, nativeFunc = 0xffffffff, i64 = 4294967295, u64 = 4294967295, encodedJSValue = 4294967295,
castToDouble = 2.1219957904712067e-314, opcode = 0xffffffff}}
</pre>
x86_64:
<pre> (gdb) p pc
$1 = {{i = 0, u = 0, {<b>i32 = 0</b>, i32padding = 0}, {u32 = 0, u32padding = 0}, {i8 = 0 '\000',
i8padding = "\000\000\000\000\000\000"}, {u8 = 0 '\000', u8padding = "\000\000\000\000\000\000"},
ip = 0x0, i8p = 0x0, vp = 0x0, callFrame = 0x0, execState = 0x0, instruction = 0x0, vm = 0x0,
cell = 0x0, protoCallFrame = 0x0, nativeFunc = 0x0, i64 = 0, u64 = 0, encodedJSValue = 0,
castToDouble = 0, opcode = 0x0}}
</pre>
Also as you can see on x86_64 it is entering the .fillExtraArgsLoop label that depends on this if statement<br>
<br>
<pre> if (pc.i32 == pcBase.i32) // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:209
goto _offlineasm_doCallToJavaScript__copyArgs;
</pre>
<br>
in the time when this statement is processed the values are:<br>
<br>
ppc64:
<pre>(gdb) p pc
$1 = {{i = 4294967295, u = 4294967295, {i32padding = 0, <b>i32 = -1</b>}, {u32padding = 0, u32 = 4294967295}, {i8padding = "\000\000\000\000\377\377\377",
i8 = -1 '\377'}, {u8padding = "\000\000\000\000\377\377\377", u8 = 255 '\377'}, ip = 0xffffffff, i8p = 0xffffffff <Address 0xffffffff out of bounds>,
vp = 0xffffffff, callFrame = 0xffffffff, execState = 0xffffffff, instruction = 0xffffffff, vm = 0xffffffff, cell = 0xffffffff,
protoCallFrame = 0xffffffff, nativeFunc = 0xffffffff, i64 = 4294967295, u64 = 4294967295, encodedJSValue = 4294967295,
castToDouble = 2.1219957904712067e-314, opcode = 0xffffffff}}
(gdb) p pcBase
$2 = {{i = 4294967295, u = 4294967295, {i32padding = 0, <b>i32 = -1</b>}, {u32padding = 0, u32 = 4294967295}, {i8padding = "\000\000\000\000\377\377\377",
i8 = -1 '\377'}, {u8padding = "\000\000\000\000\377\377\377", u8 = 255 '\377'}, ip = 0xffffffff, i8p = 0xffffffff <Address 0xffffffff out of bounds>,
vp = 0xffffffff, callFrame = 0xffffffff, execState = 0xffffffff, instruction = 0xffffffff, vm = 0xffffffff, cell = 0xffffffff,
protoCallFrame = 0xffffffff, nativeFunc = 0xffffffff, i64 = 4294967295, u64 = 4294967295, encodedJSValue = 4294967295,
castToDouble = 2.1219957904712067e-314, opcode = 0xffffffff}}
</pre>
</div></blockquote><div><br></div><div>I’m not sure about the values should be, but your pc and pcBase looks suspicious to me. I recommend you use a debugger (e.g. gdb) and step through CLoop::execute(). You can ignore the first time CLoop::execute() is called. That is just the initialization pass to init the opcode map. On the second entry to CLoop::execute(), step through the code and see how the pc and pcBase is executed.</div><div><br></div><div><br></div><br><blockquote type="cite"><div>
x86_64:
<pre>(gdb) p pc
$1 = {{i = 0, u = 0, {<b>i32 = 0</b>, i32padding = 0}, {u32 = 0, u32padding = 0}, {i8 = 0 '\000', i8padding = "\000\000\000\000\000\000"}, {u8 = 0 '\000',
u8padding = "\000\000\000\000\000\000"}, ip = 0x0, i8p = 0x0, vp = 0x0, callFrame = 0x0, execState = 0x0, instruction = 0x0, vm = 0x0, cell = 0x0,
protoCallFrame = 0x0, nativeFunc = 0x0, i64 = 0, u64 = 0, encodedJSValue = 0, castToDouble = 0, opcode = 0x0}}
(gdb) p pcBase
$2 = {{i = 1, u = 1, {<b>i32 = 1</b>, i32padding = 0}, {u32 = 1, u32padding = 0}, {i8 = 1 '\001', i8padding = "\000\000\000\000\000\000"}, {u8 = 1 '\001',
u8padding = "\000\000\000\000\000\000"}, ip = 0x1, i8p = 0x1 <Address 0x1 out of bounds>, vp = 0x1, callFrame = 0x1, execState = 0x1,
instruction = 0x1, vm = 0x1, cell = 0x1, protoCallFrame = 0x1, nativeFunc = 0x1, i64 = 1, u64 = 1, encodedJSValue = 1,
castToDouble = 4.9406564584124654e-324, opcode = 0x1}}
</pre>
<br>
as you see the values are different on x86_64 so it will not jump directly to .copyArgs as on ppc64.<br>
<br>
Also I though that the values returned by <br>
<br>
<pre>loadi ProtoCallFrame::argCountAndCodeOriginValue[protoCallFrame], temp2
loadi ProtoCallFrame::paddedArgCount[protoCallFrame], temp3
</pre>
<br>
should be the same on both architectures as these are about the current command (in this case "1").<br>
<br></div></blockquote><blockquote type="cite"><div><pre>loadi ProtoCallFrame::argCountAndCodeOriginValue[protoCallFrame], temp2
</pre>
on ppc64:
<pre>78         pc.u = *CAST<uint32_t*>(t2.i8p + 24); // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:204
(gdb) n
79         pc.i32 = pc.i32 - int32_t(0x1); // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:205
(gdb) p pc.u
<b>$3 = 0</b>
</pre>
on x86_64:
<pre>78         pc.u = *CAST<uint32_t*>(t2.i8p + 24); // /home/tpopela/dev/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:204
(gdb) n
79         pc.i32 = pc.i32 - int32_t(0x1); // /home/tpopela/dev/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:205
(gdb) p pc.u
<b>$1 = 1</b>
</pre>
<br>
<pre>loadi ProtoCallFrame::paddedArgCount[protoCallFrame], temp3
</pre>
on ppc64:
<pre>81         pcBase.u = *CAST<uint32_t*>(t2.i8p + 40); // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:206
(gdb) n
82         pcBase.i32 = pcBase.i32 - int32_t(0x1); // /home/tpopela/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:207
(gdb) p pcBase.u
<b>$4 = 0</b>
</pre>
on x86_64:
<pre>81         pcBase.u = *CAST<uint32_t*>(t2.i8p + 40); // /home/tpopela/dev/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:206
(gdb) n
82         pcBase.i32 = pcBase.i32 - int32_t(0x1); // /home/tpopela/dev/WebKit/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:207
(gdb) p pcBase.u
<b>$2 = 2</b>
</pre>
So I'm confused if that means something.<br>
<br>
I uploaded LowLevelInterpreter64.asm, LowLevelInterpreter.asm and LLIntAssembly.h for x86_64 and ppc64 on <a href="http://tpopela.fedorapeople.org/">http://tpopela.fedorapeople.org/</a><br>
<br>
Can you tell me if some of my concerns that I mentioned above are valid?<br></div></blockquote><div><br></div>I’m not sure. You should look in Interpreter.cpp for the initialization of ProtoCallFrame, and work backward from there. I’m not able to tell of the top of my head whether these should be the same or not. If there are platform differences, there should be captured by some abstraction e.g. some symbolic constant or a function to grab some value.<div><blockquote type="cite"><div></div></blockquote></div><div><br></div><div>Regards,</div><div>Mark</div><div><br></div><br><blockquote type="cite"><div>
Thank you<br>
<br>
Tom
</div>
</blockquote></div><br></div></body></html>