Friday, March 14, 2014

HEVC RTP Payload Format

RTP Header Usage

The format of the RTP header is specified in [RFC3550] and reprinted in Figure 2 for convenience. This payload format uses the fields of the header in a manner consistent with that specification. The RTP payload (and the settings for some RTP header bits) for aggregation packets and fragmentation units 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 RTP header according to [RFC3550] The RTP header information to be set according to this RTP payload format is set as follows:
Marker bit (M): 1 bit

      Set for the last packet, carried in the current RTP stream, of
      the access unit, in line with the normal use of the M bit in
      video formats, to allow an efficient playout buffer handling.
      When MSM is in use, if an access unit appears in multiple RTP
      streams, the marker bit is set on each RTP stream's last
      packet of the access unit.

         Informative note: The content of a NAL unit does not tell
         whether or not the NAL unit is the last NAL unit, in
         decoding order, of an access unit.  An RTP sender
         implementation may obtain this information from the video
         encoder.  If, however, the implementation cannot obtain
         this information directly from the encoder, e.g. when the
         bitstream was pre-encoded, and also there is no timestamp
         allocated for each NAL unit, then the sender implementation
         can inspect subsequent NAL units in decoding order to
         determine whether or not the NAL unit is the last NAL unit
         of an access unit as follows.  A NAL unit naluX is the last
         NAL unit of an access unit if it is the last NAL unit of
         the bitstream or the next VCL NAL unit naluY in decoding
         order has the high-order bit of the first byte after its
         NAL unit header equal to 1, and all NAL units between naluX
         and naluY, when present, have nal_unit_type in the range of
         32 to 35, inclusive, equal to 39, or in the ranges of 41 to
         44, inclusive, or 48 to 55, inclusive.

   Payload type (PT): 7 bits

      The assignment of an RTP payload type for this new packet
      format is outside the scope of this document and will not be
      specified here.  The assignment of a payload type has to be
      performed either through the profile used or in a dynamic way.

         Informative note: It is not required to use different
         payload type values for different RTP streams in MSM.

   Sequence number (SN): 16 bits
Timestamp: 32 bits

      The RTP timestamp is set to the sampling timestamp of the
      content.  A 90 kHz clock rate MUST be used.

      If the NAL unit has no timing properties of its own (e.g.
      parameter set and SEI NAL units), the RTP timestamp MUST be
      set to the RTP timestamp of the coded picture of the access
      unit in which the NAL unit (according to Section of
      [HEVC]) is included.

      Receivers MUST use the RTP timestamp for the display process,
      even when the bitstream contains picture timing SEI messages
      or decoding unit information SEI messages as specified in
      [HEVC].  However, this does not mean that picture timing SEI
      messages in the bitstream should be discarded, as picture
      timing SEI messages may contain frame-field information that
      is important in appropriately rendering interlaced video.

   Synchronization source (SSRC): 32-bits

      Used to identify the source of the RTP packets.  In SSM, by
      definition a single SSRC is used for all parts of a single
      bitstream.  In MSM, each SSRC is used for an RTP stream
      containing a subset of the sub-layers for a single (temporally
      scalable) bitstream.  A receiver is required to correctly
      associate the set of SSRCs that are included parts of the same

         Informative note: The term "bitstream" in this document is
         equivalent to the term "encoded stream" in [I-D.ietf-

Single NAL Unit Packets

A single NAL unit packet contains exactly one NAL unit, and consists of a payload header (denoted as PayloadHdr), a conditional 16-bit DONL field (in network byte order), and the NAL unit payload data (the NAL unit excluding its NAL unit header) of the contained NAL unit, as shown in Figure

0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |           PayloadHdr          |      DONL (conditional)       |
   |                                                               |
   |                  NAL unit payload data                        |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |

            Figure 3 The structure a single NAL unit packet

   The payload header SHOULD be an exact copy of the NAL unit header
   of the contained NAL unit.  However, the Type (i.e.
   nal_unit_type) field MAY be changed, e.g. when it is desirable to
   handle a CRA picture to be a BLA picture [JCTVC-J0107].

   The DONL field, when present, specifies the value of the 16 least
   significant bits of the decoding order number of the contained
   NAL unit.  If tx-mode is equal to "MSM" or sprop-max-don-diff is
   greater than 0, the DONL field MUST be present, and the variable
   DON for the contained NAL unit is derived as equal to the value
   of the DONL field.  Otherwise (tx-mode is equal to "SSM" and
   sprop-max-don-diff is equal to 0), the DONL field MUST NOT be

4.7 Aggregation Packets (APs)

Aggregation packets (APs) are introduced to enable the reduction of packetization overhead for small NAL units, such as most of the non-VCL NAL units, which are often only a few octets in size. An AP aggregates NAL units within one access unit. Each NAL unit to be carried in an AP is encapsulated in an aggregation unit. NAL units aggregated in one AP are in NAL unit decoding order. An AP consists of a payload header (denoted as PayloadHdr) followed by two or more aggregation units, as shown in Figure
0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |    PayloadHdr (Type=48)       |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                                               |
   |             two or more aggregation units                     |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |

            Figure  The structure of an aggregation packet

   The fields in the payload header are set as follows.  The F bit
   MUST be equal to 0 if the F bit of each aggregated NAL unit is
   equal to zero; otherwise, it MUST be equal to 1.  The Type field
   MUST be equal to 48.  The value of LayerId MUST be equal to the
   lowest value of LayerId of all the aggregated NAL units.  The
   value of TID MUST be the lowest value of TID of all the
   aggregated NAL units.

      Informative Note: All VCL NAL units in an AP have the same TID
      value since they belong to the same access unit.  However, an
      AP may contain non-VCL NAL units for which the TID value in
      the NAL unit header may be different than the TID value of the
      VCL NAL units in the same AP.

   An AP MUST carry at least two aggregation units and can carry as
   many aggregation units as necessary; however, the total amount of
   data in an AP obviously MUST fit into an IP packet, and the size
   SHOULD be chosen so that the resulting IP packet is smaller than
   the MTU size so to avoid IP layer fragmentation.  An AP MUST NOT
   contain Fragmentation Units (FUs) specified in section 4.8.  APs
   MUST NOT be nested; i.e. an AP MUST NOT contain another AP.

   The first aggregation unit in an AP consists of a conditional 16-
   bit DONL field (in network byte order) followed by a 16-bit
   unsigned size information (in network byte order) that indicates
   the size of the NAL unit in bytes.

Fragmentation Units (FUs)

Fragmentation units (FUs) are introduced to enable fragmenting a single NAL unit into multiple RTP packets, possibly without cooperation or knowledge of the HEVC encoder. A fragment of a NAL unit consists of an integer number of consecutive octets of that NAL unit. Fragments of the same NAL unit MUST be sent in consecutive order with ascending RTP sequence numbers (with no other RTP packets within the same RTP stream being sent between the first and last fragment). When a NAL unit is fragmented and conveyed within FUs, it is referred to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of another FU.

The RTP timestamp of an RTP packet carrying an FU is set to the
   NALU-time of the fragmented NAL unit.

   An FU consists of a payload header (denoted as PayloadHdr), an FU
   header of one octet, a conditional 16-bit DONL field (in network
   byte order), and an FU payload, as shown in Figure 9.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
   | DONL (cond)   |                                               |
   |-+-+-+-+-+-+-+-+                                               |
   |                         FU payload                            |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |

                    Figure 9 The structure of an FU

   The fields in the payload header are set as follows.  The Type
   field MUST be equal to 49.  The fields F, LayerId, and TID MUST
   be equal to the fields F, LayerId, and TID, respectively, of the
   fragmented NAL unit.

   The FU header consists of an S bit, an E bit, and a 6-bit FuType
   field, as shown in Figure 10.

                            |S|E|  FuType   |

                 Figure 10   The structure of FU header