A quick latency evaluation of Apple LL-HLS and the community-driven LHLS
There is an ongoing battle in the field of low latency live video streaming. Basically, 3 technologies (MPEG LL-CMAF, Apple LL-HLS and a community-driven LHLS) are competing for the title of the fastest live streamer. LL-CMAF is a MPEG-led effort on top of the existing low-latency MPEG-DASH technology. Apple’s LL-HLS is another low-latency standard that extends Apple’s HTTP Live Streaming (HLS) standard. There is also a community-driven low latency HLS variant which originates from Twitter’s Periscope platform, and is now led by companies such as Twitch and its parent company AWS, and JWPlayer. AWS recently announced a new Interactive Video Service (IVS) which is built with the Community HLS technology. In this article, I’m going to show the results of a quick latency evaluation of Apple’s LL-HLS and the Community Low-latency HLS (CL-HLS). All the tests were built with off-the-shelf tools. Also, I tried to run the comparison as fair as possible for the two technologies. For example, both test streams are hosted in the same CDN network in the same geo-location. The players were also run from the same wifi network back to back in time. However, since both technologies are new to the industry, my tests were limited by the tools that are available to me. Personally, I don’t have a preference between the two, so my tests are not intended to show one is superior over the other.
In 2016, Twitter’s Periscope platform introduced the first version of low-latency HLS. Since this initial effort, the live video streaming community has started to collaborate on experimenting different technologies to build low latency streaming, such as chunked video encoding, HTTP 1.1 chunked transfer encoding. This eventually led to a draft of the community-driven “LHLS” specification, published in Aug 2018. However, in June 2019, Apple has announced its own version of Low-Latency HLS (LL-HLS) protocol, which was subsequently turned into an IETF standard.
The Community LHLS shares similar design principles with LL-CMAF and Apple LL-HLS. It uses chunked encoding (video segments are divided into CMAF chunks or partial TS segments) and HTTP Chunked Transfer Encoding (CTE) to enable early segments delivery before whole segments are generated. HTTP CTE is a feature used by servers to push large, premature data objects of unknown size (such as video segments that are being generated by encoders) to clients in piecemeal manner. The CTE-based data transfer ends when the server sends out a zero-length chunk to indicate no more data will arrive. Note that CTE is only used by LL-CMAF and the CL-HLS. Apple LL-HLS uses the new EXT-X-PRELOAD-HINT tag (a standard HLS tag) to enable early segment delivery.
Prefetch video segments in CL-HLS
The CL-HLS uses a non-standard tag, EXT-X-PREFETCH to publish in-progress video segments for clients to prefetch. These segments are known as prefetch segments in CL-HLS standard. Prefetch segments are partial segments that are published early, even before they are fully generated by encoders. A small number of prefetch segments (usually no more than two) near the live edge are published under EXT-X-PREFETCH tags. It format is:
In the manifest, there are also complete segments preceding the prefetch segments. Complete segments are fully-generated and fully-available segments. A prefetch segment becomes a complete segment after N target segment durations, N is the number of prefetch segments allowed by the server. A CL-HLS client should always request the last prefetch segment — the segment that is being generated on the server side, in order to play the stream at low latency. CL-HLS is fully backward-compatible in the sense that a legacy HLS player can ignore the prefetch segments, and request the complete segments that are earlier in the manifest, in order to play at regular latency.
Comparison of CL-HLS and Apple LL-HLS technologies
In this section I will compare CL-HLS with Apple’s LL-HLS standard extension of HLS. This will include general observations on complexity and level of sophistication of each technology and also several test results.
CL-HLS is a simple extension of HLS specification. First of all, chunked video encoding and HTTP chunked transfer encoding are widely available technologies. Second, it adds only minimum changes on top of the standard HLS specifications, such as EXT-X-PREFETCH.
Apple LL-HLS is a more complicated standard but with more features. For example, Apple LL-HLS requires enumerating all the chunks info in the manifest which exposes the internal structure of segments to clients. This provides potentially more room of optimizations on the client side. For example, since a player has to explicitly request each chunk in a segment, it can accurately measure the download bandwidth and adapt bitrate accordingly. For CL-HLS and LL-CMAF which use HTTP CTE, it would be harder to measure download bandwidth for individual chunks.
Test 1 using Twitch.tv
I set up a test stream of CL-HLS on Twitch.tv and evaluated it. Specifically, I created a Twitch live channel using OBS camera capture as live input. The OBS captured stream is pushed to Twitch using RTMP. I played the stream using the Twitch player — a web player embedded in Twitch’s web UI. The glass-to-glass delay is around 5 seconds when played from the suburb of Boston.
As seen in Figure 1, delay was 5 seconds (13:50:29pm EST — 13:50:34pm EST). The video went from Boston to Twitch’s edge servers near Wichita, Kansas (geo-location found by looking-up the Twitch media segment edge server’s hostname), then back to Boston. The Twitch player runs in Chrome browser on a Mac computer. The same test was also run on Chrome/Windows setting, latency was also 5 seconds.
Here is the test instruction. First, open twitch.tv in Chrome browser, sign in with an account of your own. Find out the RTMP stream key in your Twitch channel’s dash board. Next, open OBS, in stream configuration, select “Twitch” under “Service”, then put your Twitch channel’s RTMP stream key in “Stream Key”. Now you can start OBS camera capturing and streaming to your Twitch channel. In another device, open Twitch in browser or in app, find your channel and play it. Finally, open time.is to get your current local time, compare to the timestamp shown on the channel video. This gives you the glass-to-glass latency of the channel.
Test 2 using AWS Interactive Video Service (IVS)
In this experiment, I created my own low latency stream hosted in the same CDN network. I also used low-latency video players to play the streams from the same physical location. I managed to get these done using CL-HLS and Apple LL-HLS streams hosted both in the AWS Northern Virginia zone. I used a web-based player provided by AWS to play the CL-HLS stream, and an iphone 7 running iOS 14 beta to play the Apple LL-HLS stream. Specifically I developed a simple iOS video player using iOS AVPlayer framework. Starting from iOS 14 beta, we no longer need entitlement to play LL-HLS streams. This was recently announced in WWDC 2020, please refer to here for details.
The tests show that,
- The Apple LL-HLS test with a Wowza sample stream and iOS AVPlayer achieved a glass-to-glass latency of 3 seconds (15:40:37pm EST — 15:40:40pm EST).
In the above figure, please compare the time shown on the computer screen (the current local time from time.is), and the burned-in timestamp shown on the iphone screen (the top-left corner).
- The CL-HLS test with AWS IVS stream and IVS sample player achieved a latency of 5 seconds (15:21:55pm EST — 15:22:00pm EST).
Since both streams are hosted in the same CDN and in the same geo-location, and both players are also located in the same wifi network in exactly the same geo-location (I was sitting on the same chair when running both tests back to back in time), my tests of CL-HLS and Apple LL-HLS provides a relatively fair comparison of the two competing technologies. Apple LL-HLS seems to be ~2 seconds faster than CL-HLS. However I don’t know what protocol Wowza uses to ingest the live input and where the live input comes from. For example, if the live source is also hosted in AWS and the live input is ingested using real-time protocols such as WebRTC, then the ingest delay would be very small. In this case, my comparison would be unfair because my LL-HLS test used off-cloud OBS broadcaster and RTMP for ingestion.
There are three things that are different for CL-HLS test and Apple LL-HLS test that might have an impact on the results. First, the CL-HLS stream (AWS-IVS) uses TS segments, while the Apple LL-HLS stream (Wowza) uses CMAF segments. However, the media container formats are unlikely to have a big impact on latency. Second, the video chunk duration of CL-HLS is unknown. While chunk duration does affect latency, the impact should be negligible as long as they are both very small (e.g., in the magnitude of sub-500 milliseconds). Also, chunk duration is a configurable encoder parameter, neither CL-HLS nor LL-HLS confines its value. Third, the CL-HLS stream has 4 bitrates while the Apple LL-HLS stream only has 1 bitrate, also the stream bitrates are different. It would be more fair to use the same live input with the same encodes for comparison. However, as of this writing, I can’t find any open-source software for this purpose. In the rest of this section, I provide the detailed instructions on how I implemented the current test design.
Instructions for reproducing the results
For CL-HLS, I used OBS broadcaster and AWS Interactive Video Service (IVS) to generate the CL-HLS stream. AWS IVS is a newly released AWS service that streams low latency live channels using CL-HLS technology. AWS IVS is currently only available in the AWS Northern Virginia zone.
The instruction for generating a CL-HLS stream using AWS IVS is given as follows.
- Sign in to your AWS account. I used my own personal account with the free, basic plan.
2. Search the “AWS IVS” service. The console should look like this,
3. Click the “Create channel” button on the top right corner. Follow the wizard to create your low-latency channel.
4. After creating a new IVS channel, find out your RTMP ingest server url and the stream key on the channel info page. You will need to use them later for configuring OBS broadcaster. Also, please copy the “Playback URL”. This is the channel playback URL you will put into the player later.
5. Open OBS broadcaster on your computer. Go to “Settings -> Stream”, and enter your live channel’s ingest server url into “Server”, and stream key into “Stream Key”. Close the OBS “Settings” window, and click the “start streaming” button from the main window to start OBS camera capturing and streaming to AWS IVS ingest endpoint. To measure latency accurately, you might want to burn a timestamp into your captured video. Please follow this page for how to do so.
6. Now your live channel should be up and running. Open the AWS IVS sample player in your browser (e.g., Chrome). It looks like the following.
8. Open time.is to get your current local time, and compare to the burned-in timestamp in the video side by side, the difference is the glass-to-glass latency caused by CL-HLS.
For Apple LL-HLS, I used a sample stream from Wowza which happens to be also hosted in AWS Northern Virginia zone (server IP found to be located in Ashburn, VA, the town where AWS Northern Virginia data center is located, see Figure 7 and 8). Next, I connected the iphone with XCode 12 running on a MACBook. You need a USB-C to lightning cable to connect the two devices. To build iOS app against iOS 14 beta, you will have to use XCode 12. I built and installed the AVPlayer application on the iphone, and simply opened the app to play the Wowza test stream. Again, open time.is to get the current local time and compare to the embedded timestamp in the video. It gives the glass-to-glass latency of Apple LL-HLS.
About the author
Bo Zhang is currently a staff video research engineer at Brightcove inc. His area of expertise include online video streaming, IP networking and telecommunications. He received Ph.D. degree from George Mason University, M.S degree from University of Cincinnati, and B.S. degree from Huazhong University of Science and Technology, all in computer science. He received the best paper award from ACM MSWiM 2011 - the flagship research conference of ACM SIGSIM. He is involved in organizations such as DASH industry forum, MPEG (ISO/IEC JTC 1/SC 29/WG 11) and CTA-WAVE.