[webkit-dev] [MSE] Range ends inclusion is deleting wanted MediaSample's

Alicia Boya GarcĂ­a aboya at igalia.com
Thu Nov 9 14:03:16 PST 2017

Hi, WebKittens!

In the YouTube Media Source Extensions conformance tests there is one
called 36.AppendOpusAudioOutOfOrder where two audio media segments are
appended out of order to a SourceBuffer: First, a segment with the PTS
ranges [10, 20) is added. Then, another one with [0, 10) is added.

(I have rounded the actual timestamps to near integers for easier

Almost at the very end of the process the buffered ranges are like this:

[ 0,  9)
[10, 20)

At this point, SourceBuffer::sourceBufferPrivateDidReceiveSample() is
called with the last audio frame, that has PTS=9 and DUR=1.

The execution reaches this conditional block:

if (trackBuffer.highestPresentationTimestamp.isValid() &&
trackBuffer.highestPresentationTimestamp <= presentationTimestamp) {

trackBuffer.highestPresentationTimestamp contains the highest PTS so far
within the current segment. The condition is true (9 <= 9) as expected
for sequentially appended frames.

Inside there is this block of code:

MediaTime highestBufferedTime = trackBuffer.buffered.maximumBufferedTime();

PresentationOrderSampleMap::iterator_range range;
if (highestBufferedTime - trackBuffer.highestPresentationTimestamp <
    range =
    range =

if (range.first != trackBuffer.samples.presentationOrder().end())
    erasedSamples.addRange(range.first, range.second);

The first if block there is an optimization, it decides whether to do a
binary search in the entire collection of MediaSample's or do a linear
search starting with the MediaSample with the highest PTS (which is
faster when appends always occur at the end), but the result is the same
in both cases.

presentationOrder() is a std::map<MediaTime, MediaSample>.

findSamplesWithinPresentationRange(beginTime, endTime) and its *FromEnd
counterpart both return a pair of STL-style iterators which cover a
range of MediaSample objects whose presentation timestamps sit in the
range (beginTime, endTime] (beginTime is exclusive, endTime is inclusive).

Then, it marks those MediaSample objects (frames) for deletion.

My question is... shouldn't the range ends inclusivity be the other way
around i.e. [beginTime, endTime)?

As I understand it, the point of that part of the algorithm is to delete
old samples that are -- even partially -- in the presentation time range
of the newly appended one, but using (beginTime, endTime] fails to
accomplish that in two cases:

a) If there is an existing MediaSample with PTS=9 and DUR=1 it will not
be removed because beginTime (=9) is exclusive.

b) If there is an existing MediaSample with PTS=10 and DUR=1 it WILL be
removed even though there is no overlap with the sample being appended
(PTS=9 DUR=1) because endTime (=10) is inclusive. This is exactly what
is making the YTTV test fail in my case.


        [ 0,  9)
        [10, 20)

    Expected result after adding [9, 10):

        [0, 20)

    Actual result in WebKit:

        [ 0, 10)
        [11, 20)

-- Alicia.

More information about the webkit-dev mailing list