[Webkit-unassigned] [Bug 200949] Media Source Extensions performance during seek

Sat May 2 14:38:32 PDT 2020

https://bugs.webkit.org/show_bug.cgi?id=200949

--- Comment #13 from Dustin Kerstein <dustin.kerstein at gmail.com> ---
> The only thing being all I-frames does is mean you can have a constant seek
> time for any point in the timeline, not necessarily a fast seek time, and
> definitely not necessarily a faster-than-realtime seek time.

Very true. Though faster than rAF decoding isn't required for PanoMoments. As long as the seek operation doesn't squash a previous seek (or we have a way to protect overloading), any amount of time the decoder takes is fine. We use video.readyState in our Chrome/Firefox MSE implementations to prevent overloading the decoder, but it appears as though Safari doesn't change its readyState when seeking, so we can't throttle the seek requests. Do you know if this is intended?  

> Seeking into the middle of a long GOP can be expensive, true. But the
> general decode cost of an all I-frame movie is much, much higher.  I-frame
> decoding is expensive, both in file size and decode time.

Indeed. Though for ad-hoc (and non-contiguous) decoding we've not yet found a better way. MSE + all I-Frames do have a huge side benefit though - we can "stream" frames and allow a viewer to interact before the entire video is downloaded. We use an asterisk-like algorithm that allows playback starting with 20% of the total frames and then seamlessly layer in the frames as they download.

> The MSE specification is built upon the premise of positive playback rates.
> By seeking backward one frame at a time to simulate a negative playback
> rate, you're swimming against the stream as far as optimizations we've built
> into the decoder to support normal playback.

Yep, that's why our main MSE implementation on Chrome/Firefox uses Mode=Sequential and a monotonically increasing timestamp regardless of the viewer's chosen frame (forwards, backwards, skipping 10, etc.) We originally tried this approach with WebKit, but it appears to still use the frame's PTS when Mode=Sequential and starts to slow down / break after a short while (to see this in action just switch the MIME type back to 'video/mp4' in https://jsfiddle.net/wsm8gh7e). When using the addSourceBuffer('audio/mpeg') workaround, this slowdown / break is avoided, though I do admit it's a rather flaky solution given that it relies on pretending we're submitting audio when we're most definitely not. There is a question here though, should WebKit be setting m_shouldGenerateTimestamps when the user sets Mode=Sequence? It appears as though Chrome/Firefox do generate their own timestamps when the user sets sequential mode. 

> Frankly, the MSE API is not built for what you're trying to achieve, and I
> think you're barking up the wrong tree here. If the MSE specification
> explicitly allowed negative playback rates, you could just specify some
> negative rate and let the media engine do all its optimizations in your
> favor, including pre-decoding as-of-yet-undisplayed frames, and dropping
> frames when the decoder can't keep up with the rate you're requesting. But
> it doesn't, and you're trying to force it to through seeking.

We've prototyped using negative rates (and the dual stream forwards+backwards design you mentioned in your later comment) and while they were functional, there were many browser/device specific idiosyncrasies, and to get it working well enough (particularly the seek time when switching directions) we ended up with a total file size that wasn't justifiable, especially when considering that these designs lose the ability to "stream" as the MSE + keyframe design can. Bit of a tangent, but you'd probably appreciate how we solved this on native code (see here for a demo https://apps.apple.com/us/app/panomoments/id1227039970 and here for the beta SDK https://github.com/MomentCaptureInc/PanoMoments). It utilizes a highly compressed long GOP h264 elementary stream, and transcodes it to a 100% keyframe stream during the download - essentially a streaming transcoder. This design was only feasible by having access to GPU accelerated encoded/decoding the native SDKs provide.

I've also recently tried using a non-seeking approach with the Mode=Sequence + addSourceBuffer('audio/mpeg') workaround. It works for a little while, but the playback always ends up stalled, even when trying to detect this state and manually setting play() again. The monotonically increasing PTS + seeking approach was definitely the best design for Chrome/Firefox and it's very close to working with WebKit (with the 'audio/mpeg' workaround). If the readyState reflected the overloaded condition (as it does on Chrome/Firefox) we'd be able to throttle the seek call. Do you think there's a chance that would be possible/appropriate to implement within WebKit? Let me know if you see any other possible workarounds. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20200502/7115dc83/attachment-0001.htm>