MPEG-DASH Support in Youtube
From last week onwards YouTube started to serve 50% of
its videos in MPEG-DASH. This is true for the YouTube TV application, https://www.youtube.com/tv#/browse
- I think this will give a big help for the player deployment. Do not forget
that DASH is related to packaging and multiplexing, not to codec.
Encoding will still be a business if it evolves to smart
handling of multiple packaging (HLS, DASH, SS, HDS) by reusing the same encoded
material (h.264...) and if it integrates smart player device adaptation and
content rules enforcement.
Google is also adding Dash to Chrome player (it should be in Chrome version 23).
DASH Basics: MPD and
Segments
Let’s quickly summarize how a DASH content is made of:
§ MPD: an
XML document describing where the various media resources present in the
content are located. The media resources can be single-media (for example,
a video-only MP4 file) or a multiplexed set of streams (for example an AV
MPEG-2 Transort Stream). Streams can be scalable (such as SVC) but we won’t go
into such details as GPAC doesn’t support advanced description of scalable
streams in DASH. Some media resources may exist in different versions, for
example different bitrate or language or resolutions. In DASH, such a “version”
of the stream is called a representation, and all representations are grouped
together in an AdaptationSet.
§ segment:
a continuous part of a media resource. The segment is the smallest part of the
media that can be located in the MPD. What a segment exactly contains depends
on the underlying media format of the content.
§ subsegment:
a continuous part of a segment, or of a subsegment.
§ sidx:
short name for SegmentIndexBox, this is an ISOBMF (MP4/3GP container) structure
describing a segment by giving its earliest presentation time, how the segment
is further divided into subsegments, random access points locations (byte
offset) and timing in the segment payload. The goal of the SIDX is to build an
index of the segment at a given granularity to simplify trick modes (seeking,
fast-forward, fast-rewind, …).
There are several ways to refer to a segment in an MPD. If the
file is made of a single segment (-single-segment option for ISOBMF), one will
likely use SegmentBase element. If a file is made of several segments, each
segment will be identified by the SegmentList syntax in the MPD, using byte
ranges. For other cases, we need to instruct MP4Box how to refer to segments
(and how to store them as well). The following switches are defined:
§ -segment-ext
EXT: tells MP4Box to generate segments with EXT extension (by defaultm4s for ISOBMF and or ts for MPEG-2)
§ -segment-name
NAME: tells MP4Box to generate each segment in a dedicated file, called
NAME%d.EXT. NAME can also have %s in it, in
which case %s will be replaced by the name of the file being dashed without
folder indication and extension. By default, such segments will be stored using
the SegmentList syntax in the MPD.
§ -url-template:
if set when generating segements in different files, the segments will be
refered to using the SegmentTemplate syntax
in the MPD.
ISO Base Media File Format
For content based on ISOBMF (ISO/IEC 14496-12), MP4Box can be
used to cut files into DASH segments. Before going any further, some
definitions will be needed:
§ segment:
for ISOBMF, a segment is a consecutive set of movie fragments. Each movie
fragment is composed of a moof box followed by mdat box(es), and all data
adressing in the mdat(s) are done using relative offsets in the moof.
§ subsegment:
a part of a segment, made of a consecutive set of movie fragments. A subsegment
can be further divided in subsegments, until only a single movie fragment per
subsegment is present.
With that in mind, we can generate DASH content by playing with
the following MP4Box parameters:
§ -dash
X: produce segments of roughly X milliseconds.
§ -frag
Y: use movie fragments of roughly Y milliseconds. By default, fragments
duration is 500 milliseconds.
§ -rap:
attempts to cut segments so that they all start with an access point (IDR,
I-frame or beginning of a gradual decoding refresh for example).
§ -subsegs-per-sidx
N: specifies how many subsegments per sidx we would like. This only covers the
first level of segment spliting (MP4Box doesn’t handle subsegments subdivision
into subsegments). Noticable values are:
§ <0:
disable: sidx will not be produced
§ 0: a
single sidx box is used for the entire segment, and each subsegment is made of
a single movie fragment (i.e., there will be X/Y subsegments in sidx). This is
the default value.
§ >0: produces X/Y/N subsegments referenced
in the first sidx.
§ -daisy-chain:
this is only used when producing multiple subsegments per segment
(-subsegs-per-sidx). If specified, subsegments will be described in SIDX in
which the last entry (subsegment) points to the next SIDX. Otherwise, multiple
SIDXs will be stored in a hierarchical way, with the first SIDX pointing to
each SIDX of the subsegments.
§ -single-segment:
special mode indicating that the file should be segmented as one single
segment. In that case, the dash duration X becomes the subsegment duration, and
a single sidx is produced before any movie fragment.
Now let’s see an example.
Dashing a file with 10 seconds, rap-aligned segments with a
fragment duration (i.e. subsegment duration since we don’t subdivide the
SIDX) of 1 sec:
MP4Box -dash 10000 -frag 1000 -rap test.mp4
The same with a separated segment using template addressing, and
5 subsegments per segments:
MP4Box -dash 10000 -frag 1000 -rap -segment-name
myDash
-subsegs-per-sidx 5 -url-template test.mp4
Generating an onDemand profile DASH file (single segment) is
just as simple:
MP4Box -dash 10000 -frag 1000 -rap -single-segment
test.mp4
MPEG-2 TS
MP4Box can also be used to segment MPEG-2 TS files. The same
options as the ISOBMF case are used, with the following restrictions:
§ -single-segment,
-frag, -subsegs-per-sidx and -daisy-chain are ignored
§ -rap
splits at the PAT preceeding the RAP found, but does not repacketize the TS to
make sure it begins with the RAP
For example, spliting a TS in 10 seconds segments can be done
with
MP4Box
-dash 10000 -url-template -segment-name segments test.ts
Also note that it is possible to use MP4Box to translate an
existing m3u8 (Apple HLS) to a conformant MPD, using the -mpd switch:
MP4Box -mpd test.mpd [-url-template]
[http://...]myfile.m3u8
Multiple
Representations
You now know how to create a conformant DASH content from a
given file, but what about the ‘A for Adaptive’ in DASH ? At first thought it
would just be enough to let you with a bunch of MPD and a good XSLT, to produce
your final MPD (which you will have to do anyway I believe). However, there are
some tricks in the segment generation itself that cannot be easily done.
The most problematic thing is that, when building ISOBMF files
designed for bitstream switching, the initial bootstrap of the DASH session
(i.e. moov and co) must contain all sample descriptions used in all
representations of the media. Therefore, we will need MP4Box here to generate
files with correct sample descriptions, and segments with correct
sampleDescriptionIndex
Although this might look a bit complex, the process is itself
quite simple; assuming you have encoded file1, .., fileN version of your movie,
you can generate a nice adaptive MPD as follows:
MP4Box -dash 10000 [other options as seen above] -out final.mpd
file1.mp4 … fileN.mp4
This works for both ISOBMF and TS files. Additionnaly, you don’t
want segments to be overriden when generating segments in dedicated files for
each representation. MP4Box gives you the possibility to format the name of the
segment using -segment-name:
MP4Box -dash 10000 -segment-name mysegs_%s
-url-template
-out final.mpd file1.mp4 ... fileN.mp
Osmo4 Playback
You can test DASH playback in GPAC using Osmo4/MP4Client. The
player supports a good part of the technology:
§ playback
from an HTTP server or from local storage for test purposes.
§ Media
segments based on TS and ISOBMF
§ independent
component download (one adaptation set for audio, one for video)
§ most
MPD syntax is supported
§ (some)
bitstream switching. ISOBMF is not complete, multiple SampleDescription are not
supported yet
§ manual
quality switching by using ctrl+h and ctrl+l
§ basic
automatic quality switching when playing HTTP urls
Some guidelines
DASH is still quite new, and few players support it. The main
problematic, especially with ISOBMF, is to address the bitstream switching
scenario while not breaking existing implementations. We would therefore
recommend:
§ do NOT
mix codecs in your various representations of the media streams (whether
multiplexed or not). This is not supported by existing players and will break
seemless switching in most cases. And there is no point doing so, so don’t do
it
§ for AVC
video, use dedicated SPS and PPS IDs for your different streams. This can even
be done with x264 using –sps-id option. This will ensure that the produced
files can still be played by most implementations supporting movie fragments.
§ Obey
the specification: use the same codec for each media in an adaptation set.
Thats Awesome News
ReplyDeleteHello Rajiv, Thanks for an excellent information on Youtube streaming architecture. As I read about DASH, it is codec agnostic. So I presume one can theoretically stream data encoded with any video codec, correct?
ReplyDeleteAFA Youtube is concerned, as it uses Flash player, do you think there will be any requirement of transcoding the content given the multitude of mobile devices out there in the market? Note that each device has different characteristics, and may not support the codec used by Youtube e.g.
It will be really great if you can elaborate on this aspect. Thank you!
Anil,
ReplyDeleteYoutube does support multiple formats for streaming,they dont rely only on flv transcoding. if you follow their basics you will get to know about it. http://en.wikipedia.org/wiki/YouTube . YouTube has come up with their own DASH implementation until Standard become mature.
DASH is specifically designed to remove monopoly from Device driven protocols like HLS,HDS,SS etc. DASH is supported on all devices. If you have any specific question send it across.
Youtube as I can see from the traces I captured (using fiddler) sends audio and video separately in different HTTP message. While sending this, it sets the content-type header to "application/octet-stream". This makes it difficult to understand what container it is actually using e.g. I find the below.
ReplyDeleteVideo stream found: h264
Audio stream found: aac
Content type: application/octet-stream
According to Youtube wikipedia, MP4 and FLV both contain h264 and aac. So not sure how I can transcode the video part.
Additional challenge I think is even if I transcode the video part, whether it will suite the Flash player? This separation of the video and audio data is kind of tricky.
Can you help a bit?
Anil,
ReplyDeleteYoutube recently changed their way of supplying contents over Desktop and Mobile devices, most of the contents based on region you will are MPEG-DASH so they can not be transcoded to flv format. Since MPEG-DASH has video fragments of smaller duration's like in HLS protocol you see ts segments. You should try the same using I phone or Android at YouTube to get the info. Or try to modify your parser code which can parse the mime type more accurately.
Let me know if this helps
Request you to check your linkedin mail box; I've a msg for you.
ReplyDeleteOk, I did not get it when you said 'transcoding to flv can not be done because the contents are based on MPEG-DASH'. Is it because they do not send the content as as container (as I observed) or there is some other reason? DASH is codec independent I guess, right?
Yes, youtube is now sending the content in fragmented form like in HLS, and I can see - for longer videos - a fixed length of chunks of video, and audio in the Fiddler HTTP trace.
I presume when you said to use iPhone or Android device, you mean to say to see the content as a container (or contained one) e.g. FLV or MP4 rather than the separate video and audio streams, which perhaps happens for the desktop devices. I will try that by changing the User-Agent header from my browser (I will try some User Agent Switcher, as I do not have the iPhone or Android device with me).
AFA the parser is concerned, I have the Xuggler APIs compiled, and I've written some sample code to dump the media info. For example, see below. I'm printing two files (which are separate video and audio streams, and one FLV container file).
If you see first two dumps, you will understand why I think there is a problem to transcode the video/audio stream as it does not show any specific container details.
######################
file /home/anilj1/javaWs/GetContainerInfo/testdata/audio_sample
Input format long name: QuickTime/MPEG-4/Motion JPEG 2000 format
Output format long name: null
Input format short name: mov,mp4,m4a,3gp,3g2,mj2
Output format short name: null
1 stream;
duration (ms): 20015;
start time (ms): 0;
file size (bytes): 176128;
bit rate: 70396;
stream 0:
type: CODEC_TYPE_AUDIO;
codec: CODEC_ID_AAC;
duration: 882688;
start time: 0;
language: und;
timebase: 1/44100;
coder tb: 1/44100;
sample rate: 44100;
channels: 1;
format: FMT_S16
#####################################3
file /home/anilj1/javaWs/GetContainerInfo/testdata/video_sample
Input format long name: QuickTime/MPEG-4/Motion JPEG 2000 format
Output format long name: null
Input format short name: mov,mp4,m4a,3gp,3g2,mj2
Output format short name: null
1 stream;
duration (ms): 15015;
start time (ms): 0;
file size (bytes): 507904;
bit rate: 270611;
stream 0:
type: CODEC_TYPE_VIDEO;
codec: CODEC_ID_H264;
duration: 1351350;
start time: 0;
language: und;
timebase: 1/90000;
coder tb: 1001/60000;
width: 320;
height: 240;
format: YUV420P;
frame-rate: 29.97;
#####################################3
file /home/anilj1/javaWs/GetContainerInfo/testdata/ashton_kutcher.flv
Input format long name: FLV format
Output format long name: null
Input format short name: flv
Output format short name: null
2 streams;
duration (ms): 14072;
start time (ms): 0;
file size (bytes): 1184122;
bit rate: 673179;
stream 0:
type: CODEC_TYPE_VIDEO;
codec: CODEC_ID_H264;
duration: unknown;
start time: 0;
language: unknown;
timebase: 1/1000;
coder tb: 500/29917;
width: 720;
height: 480;
format: YUV420P;
frame-rate: 29.92;
stream 1:
type: CODEC_TYPE_AUDIO;
codec: CODEC_ID_AAC;
duration: unknown;
start time: 0;
language: unknown;
timebase: 1/1000;
coder tb: 1/44100;
sample rate: 44100;
channels: 2;
format: FMT_S16
#####################################3
Alrite, after switching the UA value, I got the MP4 container based videos. I guess it will be easy to process the videos now.
ReplyDeleteThe only problem at I see that at proxy, I have to wait till the video file (I mean the HTTP response message payload) is downloaded before I could process it (e.g. transcode).
I am not sure if inline (or online) transcoding is possible and how. However so far this is good for my requirement I think. Will keep you posted.
Good. ... Can I know what are you trying to do? which company you work for?
ReplyDeleteI am doing my Masters at Univ of Maryland Baltimore County. Currently working as a Research Asst. When u get time, We're now connected on Linkedin.
ReplyDeleteDid you get chance to look at my previous post? I've added some more details.
This comment has been removed by the author.
ReplyDelete