<html>

    <head>

      <base href="https://bugs.webkit.org/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - JetStream should have a more rational story for jitter-oriented latency tests"

   href="https://bugs.webkit.org/show_bug.cgi?id=145762">145762</a>

          </td>

        </tr>


        <tr>

          <th>Summary</th>

          <td>JetStream should have a more rational story for jitter-oriented latency tests

          </td>

        </tr>


        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr>


        <tr>

          <th>Product</th>

          <td>WebKit

          </td>

        </tr>


        <tr>

          <th>Version</th>

          <td>528+ (Nightly build)

          </td>

        </tr>


        <tr>

          <th>Hardware</th>

          <td>All

          </td>

        </tr>


        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>


        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>


        <tr>

          <th>Severity</th>

          <td>Normal

          </td>

        </tr>


        <tr>

          <th>Priority</th>

          <td>P2

          </td>

        </tr>


        <tr>

          <th>Component</th>

          <td>Tools / Tests

          </td>

        </tr>


        <tr>

          <th>Assignee</th>

          <td>webkit-unassigned&#64;lists.webkit.org

          </td>

        </tr>


        <tr>

          <th>Reporter</th>

          <td>fpizlo&#64;apple.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Currently we have some latency tests that are meant to measure jitter.  They do this by computing the RMS.  But the RMS is a pretty bad metric.  The thing that it rewards isn't really the thing that you'd want your browser to do.  These RMS-based tests involve taking the geomean of the RMS of some samples and the sample average.  The lower the geomean, the better (in the JetStream harness we then invert the scores so that higher is better, but let's ignore that for this discussion and assume that lower is better).  Here's an example of how this can go bad.  A browser that always computes a task in some horribly long time (say, 1000ms) but never varies that time will perform better than a browser that usually computes the task super quickly (say, 10ms) and sometimes just a little bit less quickly (say, 15ms).  The former browser will have an RMS of 0 and an average of 1000.  The latter will have a RMS somewhere around 3.5 and an average of 12.5 (assuming equal probability


JetStream should not have this pathology.  The right way of avoiding it is to replace RMS with some other metric of how bad things get.  A good metric is the average of the worst percentile.  The worst 1% or the worst 5% would be good things to average.  This will catch cases where the VM jittered due to JIT or GC, but it never have the pathology that we end up giving the better score to a VM whose best case is worst than another VM's worst case.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      
      <ul>

          <li>You are the assignee for the bug.</li>

      </ul>

    </body>

</html>