Streaming Media and RTOS: 2014

Friday, December 26, 2014

Technology Trends to Watch in 2015

Top Strategic Technology Trends for 2015

1. The Internet of Things
The Internet of Things finally became dinner table conversation (well, sort of) in 2014 thanks to Google Inc. GOOGL, +0.85% , which helped to mainstream IoT with a $3.2 billion purchase of smart home thermostat maker Nest Labs.

Home automation will continue to attract new attention next year and big players will continue to pour money into smartening up everyday items.

2. 3D Printing
The ability to print out real-world in-demand objects will become even easier and more applicable in 2015. An astronaut on the International Space Station used a 3-D printer to make a socket wrench in space, Worldwide shipments of 3D printers are expected to grow 90 percent in 2015, followed by a doubling of unit shipments in 2016.

3. BLE (Bluetooth low energy) and iBeacon to add new options for planners and participants.
Gamification and scavenger hunts (as used at CES 2014).
Location information and navigation assistance: A geofence can notify attendees where they are on a map and give guidance on where they wish to go.
Personalized welcome and other location-based alert notifications upon arrival: For example, a badge is printed upon when the attendee enters the geo-fence with notification sent via the app to the badge printing location.

4. Cloud/Client Computing/Analytics
The convergence of cloud and mobile computing will continue to promote the growth of centrally coordinated applications that can be delivered to any device.
"Cloud is the new style of elastically scalable, self-service computing, and both internal applications and external applications will be built on this new style," said Mr. Cearley.

In the near term, the focus for cloud/client will be on synchronizing content and application state across multiple devices and addressing application portability across devices.Today, mobile event apps offer an unprecedented amount of analytic data – a goldmine of useful, real-time information to improve the event experience! Every touch is trackable!

5. Software-Defined Applications and Infrastructure (SDN/NFV/Saas/Paas/Naas/Iaas/)
Agile programming of everything from applications to basic infrastructure is essential to enable organizations to deliver the flexibility required to make the digital business work. Software-defined networking, storage, data centers and security are maturing. Cloud services are software-configurable through API calls, and applications, too, increasingly have rich APIs to access their function and content programmatically. To deal with the rapidly changing demands of digital business and scale systems up — or down — rapidly, computing has to move away from static to dynamic models. Rules, models and code that can dynamically assemble and configure all of the elements needed from the network through the application are needed.

6. Smart Wearables:
Google may have sprung the modern-day market for wearables to life when it unveiled Google Glass in 2012, but the market has been sputtering to attract widespread adoption. Today, high-tech fitness bands such as FitBit/JawBone/Netpulse continue to win over consumers who want an easy way to track their calories, but Apple Inc. and Sony Corp. are among the companies hoping to make wearable waves in 2015. “Smart Wearables will see strong growth with the entry of the Apple Watch and refined offerings from other players,” said Ross Rubin.

Apple will start selling its much-anticipated Apple Watch in the first half of next year. It will be interesting to see whether the tech takes off (it is Apple after all), or if consumers determine that a smart watch is not the solution they need after all. One thing is for certain, sales of Samsung’s Galaxy Gear have been a disappointment.

7. Augmented reality/Virtual reality
Augmented reality, or technologies that enhance the regular world around your eyes with visuals, continues to attract the interest of developers. In 2015, It will be adopted into more commercial applications. Sony is expected to unveil a Google Glass-like headset at CES next month that can be affixed to a person’s regular lenses and superimpose high-resolution OLED images, videos and text in front of a person’s eye. Sony reportedly plans to start mass producing the smart eyewear in 2015.

Meanwhile, Facebook, Sony, Google and Samsung have all expressed interest in conjoining with Hollywood to make virtual reality a, well, reality. New Deal Studios and other boutique movie studios have recently started to develop 360-degree films designed specifically for virtual reality headsets.

Some other technology those will hold their foot prints under technologies to watch in 2015
4K and beyond. Moore’s law applied to pixels has been incredible. Apple’s 5K iMac topped off a year where we saw 4K displays for hundreds of dollars. In mobile, pixel density will increase (to the degree that battery life, OS and hardware can keep up) and for desktop and wall, screen size will continue to increase. Wall-sized displays, wireless transmission and hopefully touch will introduce a whole new range of potential solutions for collaboration, signage and education.

Tuesday, November 25, 2014

Preface of PCR, PTS/DTS & Time Base Alterations according to ISO/IEC 13818 Part 1, Annexure D, J

Preface of PCR, PTS/DTS & Time Base Alterations according to ISO/IEC 13818 Part 1, Annexure D, J
Given that a decoding system receives a compliant bit stream that is delivered correctly in accordance with the timing model it is straightforward to implement the decoder such that it produces as output high quality audio and video which are properly synchronized. There is no normative requirement, however, that decoders be implemented in such a way as to provide such high quality presentation output. In applications where the data are not delivered to the decoder with correct timing, it may be possible to produce the desired presentation output; however such capabilities are not in general guaranteed.

The audio and video sample rates at the encoder are significantly different from one another, and may or may not have an exact and fixed relationship to one another, depending on whether the combined stream is a Program Stream or a Transport Stream, and on whether the System_audio_locked and System_video_locked flags are set in the Program Stream. The duration of a block of audio samples (an audio presentation unit) is generally not the same as the duration of a video picture.

[Within the coding of this Recommendation, data are timestamps concerning the presentation and decoding of video pictures and blocks of audio samples. The pictures and blocks are called "Presentation Units", abbreviated PU. The sets of coded bits which represent the PUs and which are included within the ISO/IEC 13818-1 bit stream are called "Access Units", abbreviated AU. An audio access unit is abbreviated AAU, and a video access unit is abbreviated VAU. In ISO/IEC 13818-3, the term "audio frame" has the same meaning as AAU or APU (audio presentation unit) depending on the context. A video presentation unit (VPU) is a picture, and a VAU is a coded picture.]

Some, but not necessarily all, AAUs and VAUs have associated with them PTSs. A PTS indicates the time that the PU which results from decoding the AU which is associated with the PTS should be presented to the user. The audio PTSs and video PTSs are both samples from a common time clock, which is referred to as the System Time Clock or STC. With the correct values of audio and video PTSs included in the data stream, and with the presentation of the audio and video PUs occurring at the time indicated by the appropriate PTSs in terms of the common STC, precise synchronization of the presented audio and video is achieved at the decoding system. While the STC is not part of the normative content of this Recommendation, and the equivalent information is conveyed in this Recommendation via such terms as the system_clock_frequency, the STC is an important and convenient element for explaining the timing model, and it is generally practical to implement encoders and decoders which include an STC in some form.

[In particular, in the STD model each of the operations performed on the bit stream in the decoder is performed instantaneously, with the obvious exception of the time that bits spend in the decoder buffers. In a real decoder system the individual audio and video decoders do not perform instantaneously, and their delays must be taken into account in the design of the implementation. For example, if video pictures are decoded in exactly one picture presentation interval 1/P, where P is the frame rate, and compressed video data are arriving at the decoder at bit rate R, the completion of removing

bits associated with each picture is delayed from the time indicated in the PTS and DTS fields by 1/P, and the video decoder buffer must be larger than that specified in the STD model by R/P. The video presentation is likewise delayed with respect to the STD, and the PTS should be handled accordingly. Since the video is delayed, the audio decoding and presentation should be delayed by a similar amount in order to provide correct synchronization. Delaying decoding and presentation of audio and video in a decoder may be implemented for example by adding a constant to the PTS values when they are used within the decoder.]

Data from the Transport Stream enters the decoder at a piecewise constant rate. The time t(i) at which the i-th byte enters the decoder is defined by decoding the program clock reference (PCR) fields in the input stream, encoded in the Transport

Stream packet adaptation field of the program to be decoded and by counting the bytes in the complete Transport Stream between successive PCRs of that program. The PCR field is encoded in two parts: one, in units of the period of 1/300 times the system clock frequency, called program_clock_reference_base, and one in units of the system clock frequency called program_clock_reference_extension. The values encoded in these are computed by PCR_base(i) and PCR_ext(i) respectively. The value encoded in the PCR field indicates the time t(i), where i is the index of the byte containing the last bit of the program_clock_reference_base field.

Specifically:

PCR(i) = PCR_base(i) × 300 + PCR_ext(i)

where: PCR_base(i) = ((system_clock_ frequency × t(i)) DIV 300) % 233, and,

PCR_ext(i) = ((system_clock_ frequency × t(i)) DIV 1) % 300

For all other bytes the input arrival time, t(i) shown in equation below, is computed from PCR( i'') and the transportrate at which data arrive, where the transport rate is determined as the number of bytes in the Transport Stream between the bytes containing the last bit of two successive program_clock_reference_base fields of the same program divided by the difference between the time values encoded in these same two PCR fields.

t(i) = PCR(i'')/system_clock_ frequency +( i − i'')/transport_rate (i)

where,

i is the index of any byte in the Transport Stream for i''< i < i'

i'' is the index of the byte containing the last bit of the most recent program_clock_reference_base field applicable to the program being decoded. PCR(i'') is the time encoded in the program clock reference base and extension fields in units of the system clock.

The transport rate is given by,

transport_rate(i) = ((i' – i'' ) × system_clock_frequency)/ PCR(i') – PCR(i'')

where,

i'is the index of the byte containing the last bit of the immediately following program_clock_reference_base field applicable to the program being decoded, such that, i'' < i ≤ i'

In the case of a time base discontinuity, indicated by the discontinuity_indicator in the transport packet adaptation field, the definition for the time of arrival of bytes at the input to the decoder is not applicable between the last PCR of the old time base and the first PCR of the new time base. In this case the time of arrival of these bytes is determined according to equation to Evaluate t(i) above with the modification that the transport rate used is that applicable between the last and next to last PCR of the old time base.

A tolerance is specified for the PCR values. The PCR tolerance is defined as the maximum inaccuracy allowed in received PCRs. This inaccuracy may be due to imprecision in the PCR values or to PCR modification during re-multiplexing. It does not include errors in packet arrival time due to network jitter or other causes. The PCR tolerance is ± 500 ns.

Transport Streams with multiple programs and variable rate

Transport Streams may contain multiple programs which have independent time bases. Separate sets of PCRs, as indicated by the respective PCR_PID values, are required for each such independent program, and therefore the PCRs cannot be co-located. The Transport Stream rate is piecewise constant for the program entering the T-STD. Therefore, if the Transport Stream rate is variable it can only vary at the PCRs of the program under consideration. Since the PCRs, and therefore the points in the transport Stream where the rate varies, are not co-located, the rate at which the Transport Stream enters the T-STD would have to differ depending on which program is entering the T-STD. Therefore, it is not possible to construct a consistent T-STD delivery schedule for an entire Transport Stream when that Transport Stream contains multiple programs with independent time bases and the rate of the Transport Stream is variable. It is straightforward, however, to construct constant bit rate Transport Streams with multiple variable rate programs.

The following schematic represents the Broadcom BCM97405’s Time Stamp Management, which roughly constitutes of the following:-

1. A single software point of control. This is called the “global STC.” All other implementation details should be as transparent as possible to software (usage of PVR offset, PCR offset, STC, etc.) This implies that the STC for audio and video decoders should match exactly.

2. Audio and video must be synchronized in time. The basic specification is to allow up to ± 20 ms of audiovisual disparity.

3. Audio and video decoders must present the same TSM interface and behave consistently. This means, for example, that if one decoder drops frames prior to receiving an initial STC in Broadcast mode, the other decoder should also do so. Similarly, if one decoder enters VSYNC mode during the same time period, then the other decoder should also.

4. Error concealment must be implemented. This means that the decoder and transport software/firmware/hardware should attempt to conceal errors in a way that meets all relevant compliance testing and in such a way that host/system software involvement is minimized.

5. VSYNC mode time must be minimized. This means that when time stamp management information is available, we should use it.

6. Audio PLLs must be locked to incoming streams and VCXOs/VECs.

The decoder uses time stamps provided in the stream to control when a frame must be displayed (PTS) or decoded (DTS). These time stamps are compared to a running system clock – the STC. Because the decoder cannot always display or decode a frame exactly when the STC and PTS are equal, a small amount of latitude is given to the decoder. This latitude translates into a threshold, within which the decoder will take one action on a frame, and beyond which the decoder will take a different action.

Generally, the time line for timestamp comparisons is divided into four sections:

• Extremely early

• Early

• Late

• Extremely late.

The dividers between these sections are the “threshold” values that determine what constitutes, for example, “extremely late” vs. “late”. In a line with four sections, there must be three such dividers: the extremely early threshold, the extremely late threshold, and the point at which a frame is exactly on time, called the maturation point (PTS = STC). Each decoder may have different early or late thresholds as best suits the performance requirements for a given usage mode for that decoder. The following scenarios apply to both the broadcast and PVR usage modes. Extremely Early Frames (PTS > STC + Threshold) extremely early frames put a burden on the buffer model of the decoder. If they are not discarded, the buffer may overflow. The threshold value for determining whether a frame is simply early or extremely early is a function of the available buffer size to which the decoder has access. An extremely early frame will be discarded and the host will be notified of a PTS error. The decoder should not expect the host to correct the problem. This is depicted in the Figure below.

Early Frames (STC + Threshold > PTS > STC) Early frames should be presented/decoded when their PTS matures. No host or transport interaction is required. This is depicted in the Figure below.

Punctual Frames (PTS = STC) Punctual frames are those that arrive at the TSM module at the maturation point. These should be presented/decoded immediately. No host or transport interaction is required. This is depicted in the Figure below.

Late Frames (STC > PTS > STC – Threshold) Late frames should be presented/decoded immediately. No host or transport interaction is required. This is depicted in the Figure below.

Extremely Late Frames (STC – Threshold > PTS) Extremely late frames should also be discarded. In the absence of another frame/sample, the last frame/sample should be presented. The decoder should generate a PTS error to the host signifying the problem, but should not require any host interaction to correct the situation. This is depicted in the Figure below.

In addition, there is a real-time interface specification for input of Transport Streams and Program Streams to a decoder. This allows the standardization of the interface between MPEG decoders and adapters to networks, channels, or storage media. The timing effects of channels, and the inability of practical adapters to eliminate completely these effects, causes deviations from the idealized byte delivery schedule to occur. This covers the
real-time delivery behavior of Transport Streams and Program streams to decoders, such that the coded data buffers in decoders are guaranteed neither to overflow nor underflow, and decoders are guaranteed to be able to perform clock recovery with the performance required by their applications.
The MPEG real-time interface specifies the maximum allowable amount of deviation from the idealized byte delivery schedule which is indicated by the Program Clock Reference (PCR) and System Clock Reference (SCR) fields encoded in the stream. Implementations of the process for recording and playback conform to the following sections. Please note that this is not complete. Please refer to ETSI TS 102 819 and ETSI ES 201 812 for completeness.
For systems implementing the MHP Stack, the process for Managing scheduled recordings includes the following activities:
1. Maintaining a list of recording requests which remain in the pending state (PENDING_WITH_CONFLICT_STATE or PENDING_WITHOUT_CONFLICT_STATE).
2. Maintaining a list of recording requests that have been successfully completed (COMPLETED_STATE), ones where recording started but failed to be successfully completed (INCOMPLETE_STATE) and ones where recording was scheduled but failed to start (FAILED_STATE).
3. Initiating the recording process for a pending recording request (in either PENDING_WITH_CONFLICT_STATE or PENDING_WITHOUT_CONFLICT_STATE) at the appropriate time, including awakening from a standby or similar power management state should this be necessary.
4. Maintaining references to recording requests that failed (FAILED_STATE) in the list of recording requests.
5. Resolving ParentRecordingRequests, initially when the request is first made and at subsequent times depending on the GEM recording specification. This includes setting the initial state of the ParentRecordingRequest and changing that state when or if more of the request is resolved.
Notes:- GEM recording specifications may define mechanisms for automatically removing requests from this list if it appears the recording will never happen. GEM recording specifications may define mechanisms for automatically removing requests from this list based on some criteria they define. Mechanisms for resolving conflicts between recording requests (e.g. use of the tuner) are outside the scope of the present document and may be specified by GEM recording specifications. The GEM recording specification is responsible for specifying the mechanism for resolving ParentRecordingRequests.
The recording process includes the following activities:
1. Identifying which recordable streams shall be recorded. If the specification of the recording request identifies specific streams then those are the ones which shall be recorded. If the specification of the recording request does not identify specific streams then the default streams in the piece of content shall be recorded. When the recording is in progress and an elementary stream is added to or removed from the program, the device shall re-evaluate the streams to record as if the recording were just starting. If during this process the device determines that different streams need to be recorded it should change the streams being recorded without segmenting the recording if possible. If the device cannot change streams being recorded without segmenting the recording then it shall segment the recording.
2. Recording the identified streams, up to the limits in the recording capability of the GEM recording terminal.
Where more streams of any one type should be recorded than the GEM recording terminal can record, streams shall be prioritized
3. Identifying recordable applications and recording them, the broadcast file system(s) carrying them and sufficient data to reconstruct their AIT entries. Applications with an Application recording description where the scheduled_recording_flag is set to "0" shall not be considered as recordable. On implementations which monitor for changes in the application signaling, when such changes be detected then the criteria for which applications are recordable shall be re-evaluated. If any applications which were recordable becomes non-recordable or vice-versa then this shall be acted upon.
4. Recording sufficient information about all transmitted timelines which form part of the recording in order to enable them to be accurately reconstructed during playback.
5. Generating a media time which increments linearly at a rate of 1,0 from the beginning to the end of the recording.
6. Recording sufficient SI to identify the language associated with those recordable streams (e.g. audio) that can be specified by language - e.g. using org.davic.media.AudioLanguageControl "what SI should be recorded with a recorded service, e.g. info about languages?" added above requirement.
7. Handling the following cases relating to the loss and re-acquisition of resources during a recording:
- the recording starts and a needed resource is not available;
- the recording is in progress and a needed resource is lost;
- the recording is in progress without a needed resource and the resource becomes available.
8. Handling the following cases relating to changes to elementary stream information during a recording:
- an elementary stream with audio, video, or data is added;
- an elementary stream with audio, video, or data is removed;
- an elementary stream with audio, video, or data is changed, e.g. the PID changed.
Notes:- The definition of the default streams to be recorded is outside the scope of the present document and should be specified by GEM recording specifications. Minimum capabilities for the number of streams of each type that GEM recording terminals should be able to record are outside the scope of the present document. The GEM recording specification is responsible for defining these. A more complete definition of which applications are recordable (and which are not) is outside the scope of the present document and should be specified by GEM recording specifications. The requirements on a GEM recording terminal to monitor for dynamic data and behavior of applications during a recording are outside the scope of the present document and should be specified by GEM recording specifications.
Managing completed recordings
The process for managing completed recordings shall include the following activities: Maintaining with all completed recordings (COMPLETED_STATE or INCOMPLETE_STATE) the following information as long as the content is retained:
- whether the recording is known to be complete or incomplete or whether this is unknown;
- whether the recording is segmented and information about each segment;
- the time and channel where the recording was made;
- the application specific data associated with the recording.
Possibly deleting the recording (including the entry in the list of recordings, the recorded data and any other associated information) once the expiration period is past for the leaf recording request corresponding to this recording.
NOTE: The present document is silent about the precision with which the expiration period must be respected. GEM recording specifications should specify how accurately it should be enforced by implementations.
Playback of scheduled recordings
Except when the initiating_replay_flag in the Application recording description of any application in the recording is set to "1", the process for playing back scheduled recordings shall include the following activities:
1) Starting the playback of recordable streams;
2) Starting the playback of recorded AUTOSTART applications where these form part of the recorded piece of content and all labeled parts of the application with the storage priority "critical to store" have been stored. Requirements on reconstructing the dynamic behavior of recorded applications during playback are outside the scope of the present document and should be specified by GEM recording specifications. When playing content which is currently being recorded, if the end of the content to be recorded is reached and recording stops, the playback must continue without interruption (but not necessarily perfectly seamlessly), regardless of any (implementation dependent) process to copy the newly recorded content from any temporary buffer to a more permanent location on the storage device.
3) A time shall be synthesized which increases linearly from the start of the recorded content to the end. This shall be used as the basis of the "time base time" and "media time" as defined by JMF. No relationship is required between this time and any time that forms part of the recorded content such as MPEG PCR, STC or DSMCC NPT.
4) When playing an entire recording with multiple segments, the segments shall be treated as a single recording. The segments shall be played in the order they appear in the array returned from the SegmentedRecordedService.getSegments method, and a time shall be synthesized for all of the segments as described in rule number 4 above. When playing across a gap between two segments the (implementation dependent) time of the gap should be minimized as much as possible with 0 gap time the goal.
5) When playing a single segment from a recording that includes multiple segments the single segment shall be treated as an entire recording playback. The beginning and end of the segment shall be the beginning and end of the playback.
6) For all transmitted time lines which form part of the original recording, reconstruct each time line when the current media time is in the range for which that time line is valid. Applications where labeled parts of the application with the storage property "critical to store" have not been stored shall be treated as if none of the application has been stored. They shall not appear to be present. When the initiating_replay_flag in the Application recording description of any application in the recording is set to "1", the GEM recording terminal shall launch the first auto-start application in the recording which has initiating_replay_flag set to "1". The GEM recording terminal shall not initiate the playback of the recordable streams since this is the responsibility of that application.

*******This paper is IEEE paper and no modifications done from me************

Tuesday, October 28, 2014

Oil Runtime Compiler (ORC)

Orc(Oil Runtime Compiler) is a just-in-time compiler implemented as a library and set of associated tools for compiling and executing simple programs that operate on arrays of data. Orc is unlike other general-purpose JIT engines: the Orc bytecode and language is designed so that it can be readily converted into SIMD instructions. This translates to interesting language features and limitations: Orc has built-in capability for SIMD-friendly operations such as shuffling, saturated addition and subtraction, but only works on arrays of data. This makes Orc good for applications such as audio processing, array math, and signal analysis & image processing.

The “code” is currently an intermediate form that is roughly a platform-agnostic assembly language that understands variables and simple arrays. It’s an intermediate form in the sense that it’s currently only stored as a list of structs — there isn’t a parser yet.

The Orc language is an assembly-like language,The Orc tools then convert this into an intermediate form (as C source code) that includes the bytecode, functions for automatically calling the compiled bytecode from C, and backup functions (in C) that are used when Orc is not available or unable to compile the bytecode into efficient SIMD code. Orc generates code optimized for the specific processor that is running, using instruction sets that are available.ORC will include the compiler support for speculative multi-threading

Since Orc creates backup functions, you can optionally use many Orc-using libraries and applications without actually having the Orc library — this is useful for transitioning an application into using Orc. However, when Orc is disabled, the application is unable to use the SIMD code generated by Orc.

Additional opcode sets can be created and registered in a manner similar to how the liborc-float and liborc-pixel libraries. In order to make full use of new opcode sets, one must also define rules for translating these opcodes into target code. The example libraries do this by registering rule sets for various targets (mainly SSE) for their opcode sets. Orc provides low-level API for generating target code. Not all possible target instructions can be generated with the target API, so developers may need to modify and add functions to the main Orc library as necessary to generate target code.

Reference:

http://cgit.freedesktop.org/gstreamer/orc/

http://nesl.ee.ucla.edu/fw/han/old_machine_backup/overo-oe/tmp/sysroots/i686-linux/usr/share/gtk-doc/html/orc/orc-concepts.html
http://code.entropywave.com/git?p=orc.git;a=summary

Wednesday, October 15, 2014

Media Transport MMT for 4K/8K Video Transmission

Media Transport MMT for 4K/8K Video Transmission
MPEG MMT [15] succeeds MPEG-2 TS as the media transport solution for broadcasting and IP network content distribution, with the aim of serving new applications like UHD, second screen, ..., etc, with full support of HTML5 and simplification of packetization and synchronization with a pure IP based transport. It has the following technology innovations:
● Convergence of IP transport and HTML 5 presentation
● Multiplexing of various streaming components from different sources
● Simplification of TS stack and easy conversion between storage file format and streaming format
● Support multiple devices and hybird delivery
● Advanced QoS/QoE engineering features

To achieve efficient delivery of MPEG Media data over heterogeneous IP networks, MMT defines

encapsulation formats, delivery protocols, and signalling message formats as shown

● “Everything that can be streamed over the Internet will be”
● Netflix/Youtube/ has announced 4K streaming
● Netflix rules the streaming world (at least in North America)
● How does Netflix's streaming video service work?
– Cloud-based virtual (and actual) servers on Amazon's EC2
– HTTP used as both a network and media protocol
– Network content caching provided by third-party Content Delivery Networks (Akamai, Level 3, Limelight)
● A too quick conclusion: The best way to stream 4K/8K video is to deploy a global HTTP-based delivery service on top of Amazon's Cloud with a helping hand from Akamai and friends As a result of Netflix's lead, will all 4K/8K streams originate from within the Cloud?

– Will 4K/8K streaming simply be another form of Cloud-based video?
● Apart from Netflix's streaming activities, video (including 4K/8K and Digital Cinema) is already moving to the Cloud
– Cloud-based production
– Cloud-based workflow
– Cloud-based storage

● Is the Cloud ready for 4K/8K streaming?
– Should we worry about the stress that 4K/8K streaming, with its extreme bit-rates, will place on Internet servers and Cloud infrastructure?

How high will the streaming video spike be when:
– 1080p High Definition streaming is common with bit-rates of 5+ Mbps (versus only 300-500 Kbps now)?
– LTE/4G networks are ubiquitous and Tablets and SmartPhones start receiving 2-6 Mbps video streams?
● The spike will be off the chart with 25 Mbps 4K streams

CloudMost Internet Video is transmitted from the Cloud
● The rapid growth of Internet Video has triggered a dramatic increase in Data Centres worldwide ...
● ... which has ignited explosive growth in the number of Internet servers
● Estimated 75 million Internet servers today (Gautam Shroff, The Intelligent Web)
– But could be much higher: Actual total unknown!!
● 785,293,473 web sites worldwide as of November 2013 (Netcraft Web Server Survey)
– Microsoft has 1 million servers (source: Steve Balmer, Microsoft)
– Google is on its way to 10 million servers (source: Google)
4K/8K Challenge: Efficient Protocols
● Streaming protocols in wide use today were not designed for the 25-100 megabits/second rates of 4K/8K streaming
● HTTP (Hyper-Text Transport Protocol) problems:
– Stateless, text-based protocol designed for delivering HTML
and images, not video
● HTTP 2.0 offers some efficiency improvements
● RTP (Real-time Transport Protocol) problems:
– No support for network quality of service (QoS)
– RTP's accompanying Real Time Control Protocol adds
complexity
– Must be modified on an ad-hoc basis to meet the requirements
of high bit-rate, low-latency streaming (see: BBC Research &
Development WHP 268)

4K/8K Challenge: Efficient Protocols
● Streaming protocols in wide use today were not designed for the 25-100 megabits/second rates of 4K/8K streaming
● HTTP (Hyper-Text Transport Protocol) problems:
– Stateless, text-based protocol designed for delivering HTML and images, not video
● HTTP 2.0 offers some efficiency improvements
● RTP (Real-time Transport Protocol) problems:
– No support for network quality of service (QoS)
– RTP's accompanying Real Time Control Protocol adds complexity
– Must be modified on an ad-hoc basis to meet the requirements of high bit-rate, low-latency streaming (see: BBC Research & Development WHP 268)

MMT in the Future Internet
Even though it is nowadays clear that almost all transport-layer protocols are converging to IP
regardless of their characteristics, today’s Internet architecture is not optimal for multimedia
services. Therefore, future networks such as CCNs will not only provide a better network
architecture for multimedia delivery, but will also require a multimedia transport solution that is
more aware of a delivery network’s requirements. MMT address such requirements, both by
exposing the detailed information required by the underlying delivery layer that is agnostic to the
specific media type and by defining an application-layer protocol that is optimized for multimedia
delivery.

Reference:
1. Streaming 4K/8K Video over IP Networks
2. Codec Technology
3. Media Transport MMT for 4K/8K Video Transmission

Friday, October 10, 2014

Network-as-a-Service (NaaS)

Network-as-a-service (NaaS) is a business model for delivering network services virtually over the Internet on a pay-per-use or monthly subscription basis. From the customer's point of view, the only thing required to create an information technology (IT) network is one computer, an Internet connection and access to the provider's NaaS portal. This concept can be appealing to new business owners because it saves them from spending money on network hardware and the staff it takes to manage a network in-house. In essence, the network becomes a utility, paid for just like electricity or water or heat. Because the network is virtual, all its complexities are hidden from view.

Requirements
For a NaaS model to be used in DCs, we believe that the following requirements must be satisfied:

Requirement 1:
Integration with current DC hardware. Existing DCs constitute a significant investment. The use of
commodity networking equipment, which typically lacks programmability features, reduces the cost of large DC deployments. For NaaS to become successful, it must not require expensive, non-commodity, networking hardware.

Requirement 2:
High-level programming model. NaaS should expose a programming model that is natural for software developers to use, hiding low-level details of network packet processing and not exposing the full complexity of the physical network topology in the DC.

Requirement 3:
Scalability and multi-tenant isolation. Compared to existing software-based router solutions , NaaS must be able to support a multitude of different applications, written by different organisations and running concurrently, unaware of each other. Therefore, to be successful, a NaaS model requires strong isolation of the different network resources offered to tenants.

Saguna and Akamai Showcase the First CDN Operating from the Mobile Base Station

· first Content Delivery Network (CDN) that operates from within the Mobile Base Station.

· This new class of CDNs leverages real-time radio congestion monitoring and proximity to mobile users to deliver superior user experience, new monetization opportunities and improved network economics.

· Getting content as close as possible to the mobile user is crucial to reducing round-trip-time (RTT) and making accurate performance tuning decisions, in order to improve the mobile Internet user experience.

· Radio network edge placement, in concert with the CDN serving the content, is especially beneficial when trying to optimize HTTPS content, since this type of content cannot be cached or optimized inside the

mobile network using standard “transparent” caching methods or optimization techniques. http://www.saguna.net/news-events/press-releases/saguna-and-akamai-showcase-the-world-s-first-content-delivery-network-operating-from-the-mobile-base-station/

Its seems that they are just a caching solution which sit at RAN level – so they can be faster than a CDN sitting in core and not doing compression/video optimization etc.

http://www.saguna.net/products/accelerate/

Thursday, October 9, 2014

Infrastructure as a Service (IaaS)

Infrastructure as a Service (IaaS) is a form of cloud computing that provides virtualized computing resources over the Internet. IaaS is one of three main categories of cloud computing services, alongside Software as a Service (SaaS) and Platform as a Service (PaaS).

In an IaaS model, a third-party provider hosts hardware, software, servers, storage and other infrastructure components on behalf of its users. IaaS providers also host users' applications and handle tasks including system maintenance, backup and resiliency planning. IaaS platforms offer highly scalable resources that can be adjusted on-demand. IaaS customers pay on a per-use basis, typically by the hour, week or month. Some providers also charge customers based on the amount of virtual machine space they use. This pay-as-you-go model eliminates the capital expense of deploying in-house hardware and software. However, users should monitor their IaaS environments closely to avoid being charged for unauthorized services.

The following are salient examples of how IaaS can be utilised by enterprise:

Infrastructure designed for Enterprizes; by internal business networks, such as private clouds and virtual local area networks, which utilise pooled server and networking resources and in which a business can store their data and run the applications they need to operate day-to-day. Expanding businesses can scale their infrastructure in accordance with their growth whilst private clouds (accessible only by the business itself) can protect the storage and transfer of the sensitive data that some businesses are required to handle.

Cloud hosting (Public/Prive/Hybrid); the hosting of websites on virtual servers which are founded upon pooled resources from underlying physical servers. A website hosted in the cloud, for example, can benefit from the redundancy provided by a vast network of physical servers and on demand scalability to deal with unexpected demands placed on the website.

Virtual Data Centers (VDC) independent of locations; a virtualized network of interconnected virtual servers which can be used to offer enhanced cloud hosting capabilities, enterprise IT infrastructure or to integrate all of these operations within either a private or public cloud implementation.

A typical Infrastructure as a Service offering can deliver the following features and benefits:

Scalability; resource is available as and when the client needs it and, therefore, there are no delays in expanding capacity or the wastage of unused capacity

Zero hardware investment ; the underlying physical hardware that supports an IaaS service is set up and maintained by the cloud provider, saving the time and cost of doing so on the client side

Cost based on Utility; the service can be accessed on demand and the client only pays for the resource that they actually use

Location Free; the service can usually be accessed from any location as long as there is an internet connection and the security protocol of the cloud allows it

Securing locations of data centers; services available through a public cloud, or private clouds hosted externally with the cloud provider, benefit from the physical security afforded to the servers which are hosted within a data centers.

Tuesday, September 30, 2014

Overview of Netflix architecture

In order to observe the basic service behavior, Lets create a new user account, login into the Netflix website and play a movie. Start monitoring the traffic during all of this activity and record the host names of the servers involved in the process and then perform DNS resolutions to collect the canonical names (CNAMEs) and IP addresses of all the server names that the browser have contacted. We also perform WHOIS lookups for the IP addresses to find out their owners. Table
I summarizes the most relevant hostnames and their owners. Fig. shows the basic architecture for Netflix video streaming platform.

Architecture:

It consists of four key components: Netflix data center, Amazon cloud, CDNs and players for Netflix

Data Centers. Our analysis reveals that Netflix uses its own IP address space for the hostname
www.netflix.com. This server primarily handles two key functions: (a) registration of new user accounts and capture of payment information (credit card or Paypal account), and (b) redirect users to movies.netflix.com or signup.netflix.com based on whether the user is logged in or not respectively. This server does not interact with the client during the movie playback, which is consistent with the
recent presentation from Netflix team [9]

Amazon Cloud. Except for www.netflix.com which is hosted by Netflix, most of the other Netflix
servers such as agmoviecontrol.netflix.com and movies.netflix.com are served off the Amazon
cloud [10]. [9] indicates that Netflix uses various Amazon cloud services, ranging from EC2 and S3, to SDB and VPC. Key functions, such as content ingestion, log recording/analysis, DRM, CDN routing, user sign-in, and mobile device support, are all done in Amazon cloud.

Content Distribution Networks (CDNs). Netflix employs multiple CDNs to deliver the video content to end users. The encoded and DRM protected videos are sourced in Amazon
cloud and copied to CDNs. Netflix employs three CDNs: Akamai, LimeLight, and Level-3. For the same video with the same quality level, the same encoded content is delivered from all three CDNs. In Section II-D we study the Netflix strategy used to select these CDNs to serve videos.

Netflix Players. Netflix uses Silverlight to download, decode and play Netflix movies on desktop web browsers. The run-time environment for Silverlight is available as a plug-in for most web browsers. There are also players for mobile phones and other devices such as Wii, Roku, etc.

Netflix uses the DASH (Dynamic Streaming over HTTP) protocol for streaming. In DASH, each video is encoded at several different quality levels, and is divided into small ‘chunks’- video segments of no more than a few seconds in length. The client requests one video chunk at a time via HTTP. With each download, it measures the received bandwidth and runs a rate
determination algorithm to determine the quality of the next chunk to request. DASH allows the player to freely switch between different quality levels at the chunk boundaries.

Ref: Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery IEEE paper

Wednesday, September 17, 2014

libde265/HEVC Supported features

feature	v0.5	v0.6	v0.7-v0.9
P slices	yes	yes	yes
B slices	yes	yes	yes
AMP	yes	yes	yes
PCM	yes	yes	yes
deblocking	yes	yes	yes
SAO	yes	yes	yes
weighted pred.	yes	yes	yes
adaptive quant.	incomplete	yes	yes
multiple slices	no	incomplete	yes
dependent slices	no	incomplete	yes
scaling lists	no	yes	yes
long-term MC	no	incomplete	yes
ref.pic.list modification	no	yes	yes
chroma 4:2:2	no	no	no
10 bit	no	no	no
parallel WPP	yes	yes	yes
parallel tiles	yes	yes	yes
parallel frames	no	no	no
SSE acceleration	yes	yes	yes
ARM acceleration	no	no	no
frame-dropping API	no	no	incomplete(1)
non-conformant speed hacks	no	no	yes(2)

incomplete = may work for some streams
(1) - streams with multiple temporal sub-streams only
(2) - deblocking and sao can be switched off to increase decoding speed
https://github.com/strukturag/libde265/wiki/Supported-decoding-features

Friday, September 5, 2014

Writing Great Unit Tests: Why Bother?

Writing Great Unit Tests: Why Bother?

Unit testing is the practice of testing the components of a program automatically, using a test program to provide inputs to each component and check the outputs. The tests are usually written by the same programmers as the software being tested, either before or at the same time as the rest of the software. What’s the difference between a good unit test and a bad one? How do you learn how to write good unit tests? It’s far from obvious. Even if you’re a brilliant coder with decades of experience, your existing knowledge and habits won’t automatically lead you to write good unit tests, because it’s a different kind of coding and most people start with unhelpful false assumptions about what unit tests are supposed to achieve. It’s overwhelmingly easy to write bad unit tests that add very little value to a project while inflating the cost of code changes astronomically. Does that sound agile to you?

Unit testing is not about finding Defects

Unit tests, by definition, examine each unit of your code separately. But when your application is run for real, all those units have to work together, and the whole is more complex and subtle than the sum of its independently-tested parts. At a high-level, unit testing refers to the practice of testing certain functions and areas – or units – of our code. This gives us the ability to verify that our functions work as expected. That is to say that for any function and given a set of inputs, we can determine if the function is returning the proper values and will gracefully handle failures during the course of execution should invalid input be provided.

Ultimately, this helps us to identify failures in our algorithms and/or logic to help improve the quality of the code that composes a certain function. As you begin to write more and more tests, you end up creating a suite of tests that you can run at any time during development to continually verify the quality of your work.

A second advantage to approaching development from a unit testing perspective is that you'll likely be writing code that is easy to test. Since unit testing requires that your code be easily testable, it means that your code must support this particular type of evaluation. As such, you're more likely to have a higher number of smaller, more focused functions that provide a single operation on a set of data rather than large functions performing a number of different operations.

A third advantage for writing solid unit tests and well-tested code is that you can prevent future changes from breaking functionality. Since you're testing your code as you introduce your functionality, you're going to begin developing a suite of test cases that can be run each time you work on your logic. When a failure happens, you know that you have something to address.

Well then, if unit testing isn’t about finding bugs, what is it about?

I bet you’ve heard the answer a hundred times already, but since the testing misconception stubbornly hangs on in developers’ minds, per principle. As Test plan development forum keep saying, “TDD is a design process, not a testing process”. Let me elaborate: TDD is a robust way of designing software components (“units”) interactively so that their behavior is specified through unit tests. That’s all!

Tips for writing great unit tests

Enough theoretical discussion – time for some practical advice. Here’s some guidance for writing unit tests that sit comfortably at Sweet Spot on the preceding scale, and are virtuous in other ways too.

· Make each test orthogonal (i.e., independent) to all the others

Any given behavior should be specified in one and only one test. Otherwise if you later change that behavior, you’ll have to change multiple tests. The corollaries of this rule include

· Test passes but not testing the actual feature

Beware of such test cases making place in your code repository which seems to doing lots of stuff in code, but in actual they were doing nothing. They were sending requests to server and no matter what server respond, they were passing. Horror!! They will become liability on you to carry them, build them and running them every time but without adding any value.

· Testing irrelevant things

This one is another sign of bad test case. I have seen developers checking multiple irrelevant things so that code passes with doing SOMETHING, well not necessarily the correct thing. Best approach is to follow single responsibility principle, which says, one unit test case should test only one thing and that’s all.

· Don’t make unnecessary assertions

Which specific behavior are you testing? It’s counterproductive to Assert () anything that’s also asserted by another test: it just increases the frequency of pointless failures without improving unit test coverage one bit. This also applies to unnecessary Verify () calls – if it isn’t the core behavior under test, then stop making observations about it! Sometimes, Testing folks express this by saying “have only one logical assertion per test”. Remember, unit tests are a design specification of how a certain behavior should work, not a list of observations of everything the code happens to do.

· Don't ship code with tests that fail, even if it doesn't matter that they fail.

It's not uncommon, particularly in test-driven development, to change your mind during design about which tests are correct or relevant, or to make an initial implementation that only covers some of the test suite. But that means you end up with failed tests that you don't actually care about. Remove them, or at very least, document them: anyone running your tests should be able to assume that a failed test indicates broken code.

· Consider using a Code coverage tool to check how much of your code is actually being tested. Coverage doesn't tell you everything: it only measures static execution paths, but it can give you some idea of things you might have missed altogether.

· Whenever you find a bug in “finished code”, add a test for it.

Make sure the test fails in the buggy code and passes when it is fixed. Areas of code you've found bugs in are more likely to be fragile in general, and bugs that have already been found are relatively highly likely to reappear.

· Test only one code unit at a time

Your architecture must support testing units (i.e., classes or very small groups of classes) independently, not all chained together. Otherwise, you have lots of overlap between tests, so changes to one unit can cascade outwards and cause failures everywhere.

If you can’t do this, then your architecture is limiting your work’s quality – consider using Inversion of Control.

· Separate out all external services and state

Otherwise, behavior in those external services overlaps multiple tests, and state data means that different unit tests can influence each other’s outcome. You’ve definitely taken a wrong turn if you have to run your tests in a specific order, or if they only work when your database or network connection is active. (By the way, sometimes your architecture might mean your code touches static variables during unit tests. Avoid this if you can, but if you can’t, at least make sure each test resets the relevant statics to a known state before it runs.)

· Avoid unnecessary preconditions

Avoid having common setup code that runs at the beginning of lots of unrelated tests. Otherwise, it’s unclear what assumptions each test relies on, and indicates that you’re not testing just a single unit. An exception: Sometimes I find it useful to have a common setup method shared by a very small number of unit tests (a handful at the most) but only if all those tests require all of those preconditions. This is related to the context-specification unit testing pattern, but still risks getting unmaintainable if you try to reuse the same setup code for a wide range of tests.

· Don’t unit-test configuration settings

By definition, your configuration settings aren’t part of any unit of code (that’s why you extracted the setting out of your unit’s code). Even if you could write a unit test that inspects your configuration, it merely forces you to specify the same configuration in an additional redundant location.

Conclusion:

Without doubt, unit testing can significantly increase the quality of your project. Many in our industry claim that any unit tests are better than none, but I disagree: a test suite can be a great asset, or it can be a great burden that contributes little. It depends on the quality of those tests, which seems to be determined by how well its developers have understood the goals and principles of unit testing.

Monday, September 1, 2014

MPEG2 PS/TS/Timecodes

MPEG-2 Program Stream Muxing

ffmpeg -genpts 1 -i ES_Video.m2v -i ES_Audio.mp2 -vcodec copy -acodec copy -f vob output.mpg

Note : In order to mux multiple audio tracks into the same file :
ffmpeg -genpts 1 -i ES_Video.m2v -i ES_Audio1.mp2 -i ES_Audio2.mp2 -vcodec copy -acodec copy -f vob output.mpg -newaudio

Note : In order to remux a PS file with multiple audio tracks :
ffmpeg -i input.mpg -vcodec copy -acodec copy -f vob output.mpg -acodec copy -newaudio

MPEG-2 Program Stream Demuxing

ffmpeg -i input.mpg -vcodec copy -f mpeg2video ES_Video.m2v -acodec copy -f mp2 ES_Audio.mp2

Note : This also works for files containing multiple audio tracks :
ffmpeg -i input.mpg -vcodec copy -f mpeg2video ES_Video.m2v -acodec copy -f mp2 ES_Audio1.mp2 -acodec copy -f mp2 ES_Audio2.mp2

MPEG-2 Start Timecode

ffmpeg -i <input_file> -timecode_frame_start <start_timecode> -vcodec mpeg2video -an output.m2v

Note : Start timecode is set as number of frames. For instance, if you want to start at 18:12:36:15, you will have to set -timecode_frame_start to 1638915 ( for 25 fps content ).

Friday, August 29, 2014

Comparison of encoder compression efficiency libdec265

Ref :https://github.com/strukturag/libde265/wiki/Encoder-comparison

Comparison of encoder compression efficiency

Cactus 1920x1080, 50 fps
Intra-frame distance set to 248 frames, x265 without WPP, HM in random-access configuration, otherwise default parameters. Date: 20/Aug/2014.

Johnny 1280x720, 600 frames

All encoders at default parameters (random-accesss), 21/Jul/2014.

Paris 352x288
Default parameters have been used for x265, x264, and f265. HM13 uses random-access configuration. Date: 18/Jul/2014.

Thursday, August 14, 2014

Threading/Thread Pool X265

x265 creates a pool of worker threads and shares this thread pool with all encoders within the same process (it is process global, aka a singleton). The number of threads within the thread pool is determined by the encoder which first allocates the pool, which by definition is the first encoder created within each process.

--threads specifies the number of threads the encoder will try to allocate for its thread pool. If the thread pool was already allocated this parameter is ignored. By default x265 allocated one thread per (hyperthreaded) CPU core in your system.

Work distribution is job based. Idle worker threads ask their parent pool object for jobs to perform. When no jobs are available, idle worker threads block and consume no CPU cycles.

Objects which desire to distribute work to worker threads are known as job providers (and they derive from the JobProvider class). When job providers have work they enqueue themselves into the pool’s provider list (and dequeue themselves when they no longer have work). The thread pool has a method to poke awake a blocked idle thread, and job providers are recommended to call this method when they make new jobs available.

x265_cleanup() frees the process-global thread pool, allowing it to be reallocated if necessary, but only if no encoders are allocated at the time it is called.

Frame Threading

Frame threading is the act of encoding multiple frames at the same time. It is a challenge because each frame will generally use one or more of the previously encoded frames as motion references and those frames may still be in the process of being encoded themselves.

Previous encoders such as x264 worked around this problem by limiting the motion search region within these reference frames to just one macroblock row below the coincident row being encoded. Thus a frame could be encoded at the same time as its reference frames so long as it stayed one row behind the encode progress of its references (glossing over a few details).

x265 has the same frame threading mechanism, but we generally have much less frame parallelism to exploit than x264 because of the size of our CTU rows. For instance, with 1080p video x264 has 68 16x16 macroblock rows available each frame while x265 only has 17 64x64 CTU rows.

The second extenuating circumstance is the loop filters. The pixels used for motion reference must be processed by the loop filters and the loop filters cannot run until a full row has been encoded, and it must run a full row behind the encode process so that the pixels below the row being filtered are available. When you add up all the row lags each frame ends up being 3 CTU rows behind its reference frames (the equivalent of 12 macroblock rows for x264)

The third extenuating circumstance is that when a frame being encoded becomes blocked by a reference frame row being available, that frame’s wave-front becomes completely stalled and when the row becomes available again it can take quite some time for the wave to be restarted, if it ever does. This makes WPP many times less effective when frame parallelism is in use.

--merange can have a negative impact on frame parallelism. If the range is too large, more rows of CTU lag must be added to ensure those pixels are available in the reference frames. Similarly --sao-lcu-opt 0 will cause SAO to be performed over the entire picture at once (rather than being CTU based), which prevents any motion reference pixels from being available until the entire frame has been encoded, which prevents any real frame parallelism at all.

NoteEven though the merange is used to determine the amount of reference pixels that must be available in the reference frames, the actual motion search is not necessarily centered around the coincident block. The motion search is actually centered around the motion predictor, but the available pixel area (mvmin, mvmax) is determined by merange and the interpolation filter half-heights.

When frame threading is disabled, the entirety of all reference frames are always fully available (by definition) and thus the available pixel area is not restricted at all, and this can sometimes improve compression efficiency. Because of this, the output of encodes with frame parallelism disabled will not match the output of encodes with frame parallelism enabled; but when enabled the number of frame threads should have no effect on the output bitstream except when using ABR or VBV rate control or noise reduction.

When --nr is enabled, the outputs of each number of frame threads will be deterministic but none of them will match becaue each frame encoder maintains a cumulative noise reduction state.

VBV introduces non-determinism in the encoder, at this point in time, regardless of the amount of frame parallelism.

By default frame parallelism and WPP are enabled together. The number of frame threads used is auto-detected from the (hyperthreaded) CPU core count, but may be manually specified via --frame-threads

Each frame encoder runs in its own thread (allocated separately from the worker pool). This frame thread has some pre-processing responsibilities and some post-processing responsibilities for each frame, but it spends the bulk of its time managing the wave-front processing by making CTU rows available to the worker threads when their dependencies are resolved. The frame encoder threads spend nearly all of their time blocked in one of 4 possible locations:
blocked, waiting for a frame to process
blocked on a reference frame, waiting for a CTU row of reconstructed and loop-filtered reference pixels to become available
blocked waiting for wave-front completion
blocked waiting for the main thread to consume an encoded frame

Lookahead

The lookahead module of x265 (the lowres pre-encode which determines scene cuts and slice types) uses the thread pool to distribute the lowres cost analysis to worker threads. It follows the same wave-front pattern as the main encoder except it works in reverse-scan order.

The function slicetypeDecide() itself may also be performed by a worker thread if your system has enough CPU cores to make this a beneficial trade-off, else it runs within the context of the thread which calls the x265_encoder_encode().

Reference from: http://x265.readthedocs.org/en/latest/threading.html

Streaming Media and RTOS