[Webkit-unassigned] [Bug 159827] run-webkit-tests should trigger a spindump when WebContent process is unresponsive

Wed Jul 20 22:55:04 PDT 2016

https://bugs.webkit.org/show_bug.cgi?id=159827

--- Comment #18 from Daniel Bates <dbates at webkit.org> ---
Comment on attachment 284131
  --> https://bugs.webkit.org/attachment.cgi?id=284131
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review

> Tools/ChangeLog:7
> +

Please explain the motivation for this change. From what I can tell the spindump(8) report is the concatenation of sample(1)ing all running processing plus some additional data. I take it that our interest in a spindump report is to capture the state of other WebKit processes (e.g. the Network process) when a WebContent process hangs to help diagnose a hard-to-reproduce test failure.

> Tools/Scripts/webkitpy/port/driver.py:192
> +        if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://')  or driver_input.test_name.startswith('file:///'):
> +            self._test_name = self.uri_to_test(driver_input.test_name)
> +        else:
> +            self._test_name = driver_input.test_name

See my remark in _check_for_spindump_trigger() and in run_spindump() (defined in ios.py) that suggest making use of the existing machinery for sample reports. If we can make use of this machinery then we can remove this code.

> Tools/Scripts/webkitpy/port/driver.py:440
> +        if not error_line.startswith("#TRIGGER SPINDUMP - "):
> +            return False
> +
> +        match = re.match('#TRIGGER SPINDUMP - (\S+)', error_line)
> +        child_process_name = match.group(1) if match else 'WebProcess'
> +        match = re.search('pid (\d+)', error_line)
> +        child_process_pid = int(match.group(1)) if match else None
> +        if child_process_pid:
> +            self._port.run_spindump(self._test_name, child_process_pid)
> +        self._server_process.write('#SPINDUMP FINISHED\n')
> +        return True

Is it necessary to have separate logic for spin dumps? I mean, this patch unconditionally replaces the machinery for sampling a process using sample(1) with machinery to call spindump(8). I do not see the need to only do one or the other, especially given that spindump(8) can only be run by root. Could we make use of the existing "PROCESS UNRESPONSIVE" message and incorporate the spin dump logic into Port.sample_process()? Could we incorporate the spin dump logic into Port.sample_process() such that we first call spindump(8) and if that command returns a non-zero exit status (say, the current user does not have password-less sudo access) then we call sample(1) on the WebContent process as we do today?

> Tools/Scripts/webkitpy/port/ios.py:432
> +        parsed_filename = os.path.splitext(test_name)[0]
> +        path_location = os.path.split(parsed_filename)[0]
> +
> +        if not os.path.exists(self.results_directory()):
> +            os.makedirs(self.results_directory())
> +        if not os.path.exists(self.results_directory() + '/' + path_location):
> +            os.makedirs(self.results_directory() + '/' + path_location)

Can we make use of the existing machinery (*) that copies sample(1) reports to the appropriate place in the results directory? If it makes sense to incorporate the spin dump logic into Port.sample_process() then we likely can pass Port.sample_file_path(..., ...) as the path for the -file argument to spindump(8) and make use of the existing machinery the collect samples and moves then to be near the test results for a failed test.

(*) After running a test Manager._look_for_new_crash_logs() is called to collect crash reports and sample reports. Manager._look_for_new_crash_logs() calls Port.look_for_new_samples() for the list of sample reports that were written and uses TestResultWriter.copy_sample_file() to copy each sample report to the appropriate place in the results directory.

> Tools/WebKitTestRunner/TestInvocation.cpp:212
> +    char spindumpMessage[1024];
> +#if PLATFORM(COCOA)
> +    pid_t pid = WKPageGetProcessIdentifier(TestController::singleton().mainWebView()->page());
> +    snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid));
> +#else
> +    snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s\n", TestController::webProcessName());
> +#endif

It it necessary to emit a new message here "TRIGGER SPINDUMP"? Could we make use of the existing "#PROCESS UNRESPONSIVE" message and teach run-webkit-test to call spindump(8) when it sees this message instead of sample(1) on applicable platforms?

> Tools/WebKitTestRunner/TestInvocation.cpp:215
> +    if (!TestController::singleton().usingServerMode())
> +        return;

Can you elaborate on this?

> Tools/WebKitTestRunner/TestInvocation.cpp:216
> +    fputs("If running manually, hit enter...\n", stderr);

Is this a workflow that most people will make use of? Running WebKitTestRunner in server mode and wanting to invoke spindump(8) by hand?

> Tools/WebKitTestRunner/TestInvocation.cpp:218
> +    if (!fgets(spindumpMessage, 1024, stdin) || strcmp(spindumpMessage, "#SPINDUMP FINISHED\n"))

Is it necessary to look for the message "#SPINDUMP FINISHED\n"? Would it be sufficient to use fgetc(3) to read a single newline character or EOF character and teach run-webkit-tests to emit a single newline character?

> Tools/WebKitTestRunner/TestInvocation.cpp:219
> +        fputs("Spindump return failure\n", stderr);

I take it that this output is helpful.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20160721/b4142ea4/attachment-0001.html>