Beyond Visuals: A New Experience of UHD Video

Release Date:2023-08-01 By Xu Huoshun, Mao Liangui

Video is undergoing a transformation driven by technology in the 21st century. With the rapid progress of 5G, Wi-Fi 6, and FTTH networks, capable of delivering gigabit speeds, along with advancements in video production, distribution, and the integration of AI, edge cloud and CDN, video is evolving towards ultra-high definition (UHD), multi-dimensional, immersive, and highly interactive experiences.

New UHD scenario is a collective term for various types of UHD scenarios. It encompasses a wide range of immersive service forms such as 4K (and above) UHD video, VR video, multi-viewpoint video (MVV), free-viewpoint video (FVV), flexible scaling, and 2D-to-3D video conversion. The new UHD scenarios can be categorized into three types based on their features.

The first type is the video featuring image quality enhancement, such as 4K and 8K UHD videos. Leveraging existing video services as a foundation, these videos undergo advancements in resolution, dynamic range, color gamut, frame rate and sampling.

The second type is the UHD video featuring dimensional enhancement and interaction, such as VR video, MVV based on frame synchronization, and 360-degree FVV. These videos enable multidimensional experiences and offer strong interactivity.

The third type is the AI-enhanced UHD video, which uses technologies like 5G, cloud, and AI to provide enhanced user experience. Examples include AI-enabled image quality enhancement, AI-enabled 2D-to-3D video conversion, and AI virtual viewpoints.

8K UHD Video

8K video refers to ultra-high-definition video with an image resolution of 7680×4320 pixels, characterized by high resolution, high frame rate, high color depth, wide color gamut, and high-dynamic range.

  • High resolution: The dense arrangement of 7680×4320 pixels enhances the level of detail and sharpness in the visuals.
  • High frame rate: The true 8K video usually requires a frame rate of 120 fps at least. This ultra high frame rate improves the smoothness and fluidity of the images, especially for fast-moving images.
  • High color depth and wide color gamut: These features enable smoother color gradients, and richer and more lifelike colors.
  • High dynamic range: This feature enhances the contrast of the image, adding depth and a sense of 3D to the visuals, resulting in a more immersive and layered viewing experience.

 

8K UHD video has been widely used in live sports broadcasting, arts shows, and cultural and tourism live streaming. Users can enjoy an immersive experience of these live events on a big screen, with more realistic images, amazing special effects, richer scenarios, and finer performances. 8K UHD films and TV shows revolutionize the visual experience, making viewers feel as if they were physically present at the scene. In addition, leveraging the ultra-high resolution and realistic images, 8K UHD video can empower the industries like industrial manufacturing and remote healthcare.  

Interactive Video

  • Frame-Synchronized MVV

MVV refers to the simultaneous broadcasting of multiple independent video streams from different viewpoints to users during events such as sports broadcasts. This allows users to choose their preferred viewing point or even watch multiple viewpoints at the same time. By breaking the traditional rules of TV broadcasting, this function gives users the freedom to personalize their viewing experience.

As multi-viewpoint videos are transmitted independently, various synchronization issues may arise during camera capture, encoding, transmission, relay, buffering, decoding, and rendering. To address these issues, multi-channel frame synchronization technology is adopted to guarantee the consistency of the multi-viewpoint video feeds, allowing users to experience a seamless and synchronized viewing experience.

The core technology, multi-channel frame synchronization, utilizes a synchronization signal generator at the camera side to achieve signal synchronization. During video encoding, synchronization tags are embedded, and when playing back at the client side, the frame synchronization information from multiple video streams is decoded. This enables frame alignment during the final presentation, allowing users to seamlessly switch between different viewpoints while maintaining a high level of temporal consistency across all frames.

  • 360-Degree FVV

FVV refers to a business model centered around events and activities, where multiple or surround cameras are used to capture video information from multiple viewpoints. Through stages such as production, transmission, distribution, and playback, users have the ability to freely rotate their view of the captured objects or scenes from any viewpoint by manipulating the user interface.

FVV fully leverages the advantages of multiple viewpoints, detailed capture, and freedom of viewing. It allows users to enjoy videos from different viewpoints and interactively rotate the view, enhancing their sense of participation and interactivity while getting rid of the dependency on traditional director viewpoints.

FVV can also be applied in the production of special effect scenes, creating effects like bullet time and freeze-frame surround. With enhanced technologies like virtual viewpoints and AI recognition, FVV delivers an exceptional user experience, offering superior immersion and engagement.

  • 8K VR FOV

Due to restrictions in terminal decoding capabilities and network bandwidth, current VR videos are mostly in 4K resolution. As a result, the actual image quality experienced by users is relatively low, significantly affecting their viewing experience. The adoption of 8K VR poses higher requirements for both transmission bandwidth and terminal decoding capabilities, which are not yet widely available in terminal devices. Therefore, there exists a contradiction between the clarity and resource requirements of VR content. However, the use of field of view (FOV) can effectively balance this contradiction.

The principle of FOV is to set a dedicated encoding server at the system end and encode UHD VR videos in a layered and segmented manner. For example, in the case of 8K VR videos, a 2K 360° video is used as the base, supplemented by multiple UHD video fragments encoded for specific regions. During terminal decoding and presentation, the process begins by presenting the base 2K video, and then, based on the current FOV position, the corresponding UHD segmented videos are dynamically retrieved and rendered, ensuring high-quality video presentation within the field of view. This approach can greatly save transmission bandwidth expense while lowering the demands on terminal decoding. As a result, it enables 4K terminals to deliver an 8K VR video quality experience, achieving bandwidth savings of over 70% and reducing head movement latency to less than 150 ms.   

AI-Powered UHD Video

  • AI-Enabled Image Enhancement

As display devices continue to improve in size and performance, people's expectations for video quality are also increasing. However, there is a significant demand for the restoration and enhancement of low-quality videos and images that suffer from issues such as outdated content, low resolution, noise and compression artifacts, as they fail to fulfill users' viewing needs. The AI-enabled image enhancement technology adopts AI algorithms such as super-resolution, frame interpolation, color enhancement, and image denoising to repair and enhance the quality of low-quality videos and images. For example, it can improve old standard-definition (SD) movies to high-definition (HD) or even 4K quality.   

  • AI-Enabled 2D-to-3D Video Conversion

3D services are an innovative video business model that emerged earlier. However, due to the scarcity of 3D content and high production costs, these service have mostly remained concentrated in professional scenarios like cinemas, failing to reach a wider audience in their homes. To address this challenge, the AI-enabled 2D-to-3D technology can be utilized to automatically convert traditional 2D content into 3D, offering a potential solution to enhance the availability of 3D content.

By utilizing a 2D-to-3D conversion engine powered by AI, it is possible to generate 3D videos from 2D content, thereby addressing the scarcity of 3D video resources. Furthermore, this technology supports real-time conversion of live broadcast signals into 3D video signals, facilitating the delivery of 3D live streaming services.

  • AI-Enabled Virtual Viewpoint

In FVV scenarios, the number of cameras placed around the venue is limited due to place or cost constraints. As a result, users may experience visual jitters caused by large disparities between adjacent camera positions during viewpoint transitions. To solve this problem, AI synthesis technology can be applied between adjacent frames to generate N virtual viewpoints, effectively filling in the missing viewpoints and ensuring smoother and more fluid transitions in FVV and bullet time effects.

Smart Spectator Experience at World University Games

ZTE's multidimensional video products integrate the features of ultra-high definition, interactive, and AI-enhanced video capabilities to provide users with a brand-new experience. These products have been commercially deployed in various industries such as education, healthcare, sports, and tourism. One notable service is the smart spectator live broadcasting that supports MVV, FVV, and UHD VR content, catering to both on-site viewers using small screens and off-site viewers on large screens. In July 2023, the 31st FISU Summer World University Games was held in Chengdu, China, and ZTE's multi-dimensional video products played a crucial role in enhancing the smart spectator experience, providing innovative features like MVV, FVV, VR and virtual viewpoints.    

As shown in Fig. 1, the smart spectator system consists of four main parts: video capture and streaming adjustment, smart media processing system, UHD CDN, and terminal video playback.

  • Video capture and streaming adjustment: It involves deploying cameras at the venue to capture real-time video and encode it. The encoded video streams are then transmitted to the system through dedicated lines or reliable networks.  
  • Smart media processing system: It is deployed in the central access office, responsible for enhanced processing on received videos, including protocol conversion, identification, aggregation, transformation, and enhancement. These processes aim to generate scenario-oriented video media streams.
  • UHD CDN: It is also deployed in the central access office. Its main responsibilities include distributing and storing media content. It ensures that videos are distributed to the required locations and stored or cached based on demand.
  • Terminal video playback: It involves integrating the multi-dimensional video SDK into the terminal app. The terminal app connects to the video distribution network, decodes the media streams, and renders them according to the scenario for seamless playback.

 

The smart spectator business scenario primarily includes innovative videos such as frame-synchronized MVV, VR, FVV, and virtual viewpoint. It relies on advanced technologies such as full Gigabit network, video capture, encoding, broadcasting, storage and transmission, as well as terminal decoding and rendering techniques, to deliver a higher-definition, multi-dimensional, and interactive viewing experience. By leveraging cutting-edge video technologies, it adds vitality to large-scale events and activities, increasing the viewers' sense of immersion and engagement.