Tuesday, January 14, 2014

VP9 Video Codec

Video data accounts for a significant proportion of all internet traffic, and the trend is toward higher quality, larger format and often professionally produced video, encoded at higher data rates and Supported by the improved provisioning of high bandwidth internet connections. VP9 is being developed as an open source solution tailored to the specific characteristics of the internet, under the auspices of the WebM project, with the aim of providing the highest quality user experience and the ability to support the widest range of use-cases on a diverse set of target devices. This document provides a high-level technical overview of the coding tools that will likely be included in the final VP9 bitstream. A large proportion of the advance that VP9 has made over VP8 can be attributed to a straightforward generational progression from the current to the future, driven by the need for the greater efficiency required to handle a new coding "sweet-spot" that has evolved to support the provisioning of larger frame size, higher quality video formats.
VP9 has many design improvements compared to VP8. VP9 will support the use of superblocks. A quadtree coding structure will be used with the superblocks.
Google put main focus on upgrading following known issues in vp8 compared to arch rival h.264

Tiled Encoding for Higher resolution
1. Tiled encoding of new content in YouTube. (Recode of older content Q1).
2. (480p, 720p 2 tiles, 1080p 4 tiles, 4K 8 tiles: (360p 2 tiles coming in the Q1 2014))
New prediction modes
A large part of the coding efficiency improvements achieved by VP9 can be attributed to the introduction of larger prediction block sizes. Specifically, VP9 introduces the notion of Super-Blocks of size up to 64x64 and their quad-tree like decomposition all the way down to a block size of 4x4, with some quirks as described below. In particular, a superblock of size 64x64 (SB64) could be split into 4 superblocks of size 32x32 (SB32), each of which can be further split into 16x16 macro blocks (MB). Each SB64, SB32 or MB could be predicted as a whole using a conveyed INTRA prediction mode, or an INTER prediction mode with up to two motion vectors and corresponding reference frames, as described in Section 3.2. A macro block can be further split using one of three mode families: B_PRED - where each 4x4 sub-block within the MB can be coded using a signaled 4x4 INTRA prediction mode; I8X8_PRED - where each 8x8 block within the MB can be coded using a signaled 8x8 INTRA prediction mode; and SPLITMV - where each 4x4 sub-block within the MB is coded in INTER mode with a corresponding motion vector, but with the option of grouping common motion vectors over 16x8, 8x16, or 8x8 partitions within the MB. Note that the B_PRED and SPLITMV modes in VP9 work in the same way as they do in VP8. Experiments with 1/8 pel and sub-pixel motion estimation
In short following prediction modes were applicable
1. Intra modes
2. Inter mode
3. Compound INTER-INTRA Mode
Sub-Pixel Interpolation
The filters used for sub-pixel interpolation of fractional motion are critical to the performance of a video codec.  The maximum motion vector precision supported is 1/8-pixel, with the option of switching between 1/4-pixel and 1/8-pixel precision using a frame level flag. If 1/8-pixel precision is used in the frame, however, it is only used for small motion, depending on the magnitude of the reference motion vector.  For larger motion - indicated by a larger reference - there is almost always motion blur which obviates the need for higher precision interpolation.  VP9 defines a family of three 8-tap filters, selectable at either the frame or macroblock level in the bit stream:
8-tap Regular: An 8-tap Lagrangian interpolation filter designed using the int_filt function in MATLAB, 
8-tap Sharp: A DCT-based interpolation filter with a sharper response, used mostly around sharper edges,
Simple concept is :
Sub-pel precision and filters can have a big impact on compression efficiency.... BUT
Longer filters -> Better interpolation (but more cost)
Higher precision sup pel -> Better interpolation (but more cost)
Entropy coding
Much more adaptive to extremes and better behaved for large image formats. Coding of complex uncorrelated motion and very dense residual signals is an area where we need to improve.
  • New predictive models and contexts.
  • More efficient updates / adaptation.
  • Segment level and SB level adjustments
  • Reference frame contextual coding
  • Expanding the previous coef-contexts
  • Modifications to coding of explicit segment map (differential coding option and contexts)
  • Separate coding context for different frame types
Next Steps:   Major overhaul of how motion is coded
New Transforms Modes
VP9 supports the Discrete Cosine Transform (DCTs) at sizes 4x4, 8x8,   16x16 and 32x32 and Super block 64x64(under development) and removes the second-order transform that was employed in VP8.  Only transform sizes equal to, or smaller than, the   prediction block size may be specified.  Modes B_PRED and 4x4 SPLITMV
are thus restricted to using only the 4x4 transform; modes I8X8_PRED    and non-4x4 SPLITMV can use either the 4x4 or 8x8 transform; full- size (16x16) macroblock predictors can be coupled with either the   4x4, 8x8 or 16x16 transforms, and superblocks can use any transform
Size up to 32x32.  Further restrictions on the available sub-set of transforms can be signaled at the frame-level, by specifying a maximum allowable transform size, or at the macroblock level by explicitly signaling which of the available transform sizes is used. In addition, VP9 introduces support for a new transform type, the Asymmetric Discrete Sine Transform (ADST), which can be used in combination with specific intra-prediction modes
Vp9 offers an optional feature known as segmentation. When enabled, the bitstream codes a segment ID for each block, which is a number between 0 and 7. Each of these eight segments can have any of the following four features enabled:
Skip – blocks belonging to a segment with the skip feature active are automatically assumed to not have a residual signal. Useful for static background.
Alternate quantizer – blocks belonging to a segment with the AltQ feature may use a different inverse quantization scale factor. Useful for regions that require more (or less) detail than the rest of the picture. Or it could be used for rate control
Ref – blocks belonging to a segment that have the Ref feature enabled are assumed to point to a particular reference frame (Last, Golden, or AltRef), as opposed to the bitstream explicitly transmitting the reference frame as usual.
AltLf - blocks belonging to a segment that have the AltLf feature enabled use a different smoothing strength when getting loop-filtered. This can be useful for smooth areas that would otherwise appear too blocky. They can get more smoothing without having to smoother the entire picture more.

Loop Filter:Like AVC and HEVC, VP9 specifies a loop filter that is applied to the whole picture after it has been decoded. It attempts to clean up the blocky artifacts that can occur. The loop filter operates per super block, filtering first the vertical edges, then the horizontal edges of each superblock. The super blocks are processed in raster order, regardless of any tile structure. This is unlike HEVC, where all vertical edges of the frame are filtered before any horizontal edges. There are 4 different filters:
16-wide, 8 pixels on each side of the edge
8-wide, 4 pixels on each side of the edge
4-wide, 2 pixels on each side of the edge
2-wide, 1 pixel on each side of the edge

Scale better for larger images
Comparison to existing codecs
Performance comparison of H.264/MPEG-AVC and H.265/MPEG-HEVC (High-Efficiency Video Coding) as well as the recently published proprietary video coding scheme VP9. Below paper talks about the current status and updates       http://iphome.hhi.de/marpe/download/Performance_HEVC_VP9_X264_PCS_2013_preprint.pdf

Industry coverage

Who is supporting what?
H.265 versus VP9 is a little like HDMI versus DisplayPort in that the latter’s royalty free approach should give it the edge, but the former’s ubiquitous legacy means it has widespread industry support. Previously this made H.264 an easy winner over VP8.  This time around things are closer. Google used CES 2014 to show VP9 has support from LG, Panasonic, Sony, Samsung, Toshiba, Philips, Sharp, ARM, Intel, Nvidia, Qualcomm, Realtek Semiconductor and Mozilla. As mentioned, Google has also built VP9 support into its Chrome browser and YouTube.
The flip side is all these companies have also backed H.265 and even Google will support it in Chrome and hasn’t ruled out YouTube support. In fact, this led to an amusing quote from Francisco Varela, YouTube global head of platform partnerships, that "We are not announcing that we will not support HEVC." Consequently most companies look like they will support both formats, much like you’d be hard pressed to find an audio player that doesn’t support both MP3 and AAC.

Do we need to worry about format support?
With the decline of physical media and the rise of 4K Ultra HD, there has never been greater pressure on new video compression standards to deliver. Thankfully both do, if in slightly different ways, and – unlike past format wars – there is likely to be space for both as the industry seems reluctant to wholly commit to a) a future paying license fees, or b) being beholden to Google. That means it's very likely that most devices you buy will support both, which is great news for everyone.

1 comment: