[Webkit-unassigned] [Bug 46453] [NRWT] Put the http and websocket tests to end of the test list

Mon Sep 27 17:01:42 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=46453

--- Comment #12 from Gabor Rapcsanyi <rgabor at inf.u-szeged.hu>  2010-09-27 17:01:41 PST ---
(In reply to comment #10)
> Ugh. Okay, I've taken a look at both the patch posted here and the patch on GitHub. They're troubling, but then again I find anything that requires locking, especially cross-process locking, to be troubling :)
>
> First, assuming NRWT is fully exploiting the parallelism of the machine -- which it should be doing, ignoring the currently open bugs that are throttling it  -- a single NRWT run should pin the machine and running multiple in parallel will at best slow things down and more likely introduce more flakiness and weird timeouts and failures.
>
> So, I need to step back and ask, why do we want to run more than one NRWT in parallel? Adam, Eric - how do you see what I wrote above interacting with the commit queue? Gabor, are there other reasons you want to run these things in parallel?
>

We have some multicore servers with several build and try bots, so it would be useful if we could run more than one layout test per machine.

> Second, in order to fix the currently open bugs that are causing multithreaded NRWT to not really work (see, e.g., bug 36622, bug 38300, bug 43565 ), we need to switch NRWT to a multi-process model instead. Which means that (a) anything you do that relies on inter-thread locking will need to be reworked, (b) it's likely that the shard-by-directory mechanism currently used will be reworked as well. I would really prefer not to have this change and that change going on at the same time.
>

If I put all tests what needs locking to a same group as Tony suggested then we don't need thread locking anymore. Just one thread will run the http tests and do the http server locking locally.

> Third, I do not want to do anything that slows down the "normal" execution path of running one NRWT on a machine at a time or introduces more flakiness (see above). Given that, it's unclear what the best way to interleave http- and websocket-requiring tests is with other tests. In a single-threaded model, it doesn't really matter, since the total execution time is fixed. But, in a multi-threaded model, it may be better to try and ensure that we distribute http- and websocket- based tests evenly over the entire test run in order to minimize the load on the servers and reduce flakiness. We sort of get that today by starting http tests at the beginning of the run in (one of) the threads while running most of the tests in others. I would want to see either that (a) we determine that the servers can handle whatever concurrency we throw at them (unlikely), or (b) we come up with some sharding mechanism here that allows us to continue to smooth the load as much as possible without
  introducing too much additional complexity. Of course, if it turns out that we really do want to run multiple NRWTs in parallel, then we want to minimize the amount of time we hold the lock :(
>

I can make an option check and put the http tests to first when we don't need locking.

> Fourth, now looking at the particulars of this patch, I would really like to see the algorithm abstracted better and the intent captured more clearly. I'm thinking of things along the lines of:
>
> 1) hide whether or not a test requires a lock behind an interface or method, e.g. test_requires_lock() that encapsulates all of the path splitting and parsing.
>

Thats a good idea, I will put this method to run_webkit_tests.TestRunner.

> 2) Hang the server lock on some fairly obvious global object like TestRunner (which you already have in your GitHub patch, but make them private), and instead of passing the locks to each thread, expose methods on TestRunner to acquire and release the lock as possible. As Tony suggested, it might be helpful here to track how many lock-requiring tests remain and release the lock automatically when you're done. Of course, any shared-memory approach like this will still break when I switch to a multi-process model, but if we pick the right methods to expose on TestRunner to signal when we need the lock, it might be possible to keep this relatively clean.
>

As I said above the server lock will be moved to the thread.

> Lastly, as a nit, the past tense of "split" in English is "split", so "splitted_dir_path" is a bit awkward. I would suggest "split_path", "split_dir_path", "separated_path", or "separated_dir_path" as alternatives, probably in that order. But, again, if you hide the logic of testing whether or not the file needs the lock behind a method, that makes it better as well.

My bad, I will correct :)

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.