Physical presence is an entertainment experience that is irreplaceable by any other means for one simple reason: true immersion. For instance, when you travel to an exotic island, you can smell the sea breeze, sunbathe in a beach, taste the local foods, and socialize with the local people. When you attend a live concert or a sport game, you can feel the excitement of the crowd and scream out to cheer up. Such a deep immersive experience is not achievable by other means like reading travel guides, browsing scenery photos, or watching tourism videos.
Whereas physical presence is infeasible in many occasions, immersive experience can be simulated via advanced technologies such as virtual reality (VR), interactive video, photo sharing, voice interaction, and text messages, in decreasing order of engagement.
Hereby we review the key technologies to simulate immersive experience: VR and augmented reality (AR) enhance human perception with rich information; ultra-high-definition (UHD) improves the resolution (i.e. smoothness) of videos; and high-dynamic range (HDR) expands the brightness/darkness range of existing display devices to approximate the richer color space perceivable by human eyes.
VR needs to cover both eyes of a viewer with a head-mounted device (HMD) and soaks the viewer into a fully virtual environment that is totally isolated from the real surroundings. AR adds, subtracts, or replaces certain objects onto the real-world video captured by cameras and shown on a display device. AR is more complex than VR and requires more computing power.
VR may be more powerful than the reality in some sense. People who are physically located in disparate places across the globe can present in the same virtual space to meet, interact, collaborate, and play together. This transcends the inherent limitations of a real world.
Mainstream VR gears in the market today can be classified into high-end, middle-end, and entry-level categories according to the cost and technologies:
● High-end VR devices such as Oculus Rift Touch and HTC Vive provide crisp resolution, swift orientation, natural interaction, and strong sense of immersion. They require high-end desktop PCs with state-of-art GPUs for complex computing, dedicated HMDs as display, and purpose-designed gloves or controllers for natural interaction.
● Middle-end VR devices such as Samsung Gear VR rely on a smartphone for both computing and display, provide good resolution and space orientation, but lack in crispness, smoothness, and natural interaction.
● Entry-level VR devices like Google Cardboard provide basic sense of virtual reality but may cause dizziness after continuous viewing. Nevertheless, these VR gears give everyday users a chance to taste the novel VR content that are gaining popularity in YouTube 360, Oculus Store, and other content stores.
The key parameters for VR quality include video resolution, display refresh rate, and latency of sensors. A ideal resolution is 1920×1080 or higher for each eye. The current mainstream resolution is 960×1080 for one eye and 1920×1080 for two eyes combined. A ideal refresh rate for VR display is 120 fps or 240 fps. The best VR headset today can reach 90 fps. Some prototype VR devices like PlayStation VR claims to reach 120 fps. A low refresh rate may cause unrealistic feeling and dizziness. The total latency should be less than 20 ms, i.e. from the time when a head turn or hand movement is captured by various sensors, transmitted to the controller, to the time when the appropriate new position is calculated and displayed. Oculus Rift has 25 ms, much better than others in the market, usually at 40 ms. A short latency makes VR feel natural and smooth.
The creative applications of VR and AR are mushrooming. Customers in a VR-enabled car dealership can “see” all the possible models with their desired interior and exterior configurations, far beyond the limited number of display models available in the show floor. An AR-enabled fitting room allows a customer to “see” the effect of each fashion piece in different angles. In the realm of home entertainment which is traditionally only one-way delivery of video content, VR and AR are revolutionary in enabling an end user to enter into a VR/AR scene, select any angle to watch, use natural gesture or voice to interact, and thus simulate the immersive sense of the real world.
Higher resolution is always desirable for better video entertainment. The current generation video standard of full high definition (FHD) at 1920×1080 is defined by ITU-R Recommendations BT.709. The newer generation video standard of UHD at 4K or 8K resolution is defined by ITU-R Recommendation BT.2020 with three key parameters: resolution, refresh rate, and color depth.
● Resolution: UHD includes both 4K resolution at 3840×2160 and 8K resolution at 7680×4320. While support for 8K in TV sets and player devices are still expensive and nascent, UHD/4K has increasingly become economical and popular.
● Refresh rate: UHD standard has deprecated the lower-quality interlaced-scan mode in the FHD and kept only the higher-quality progressive-scan model, including p24, p25/p50, p30/p60, p24/1.001=23.976, p30/1.001=29.97, and p60/1.001=59.94. Higher refresh rates in p100, p120, and p120/1.001=119.88 are added for smoother experience in sport games.
● Color depth: In addition to the 8-bit and 10-bit modes in FHD, UHD adds a 12-bit color mode, i.e. each of the prime colors (red, green, and blue) is represented with a 12-bit value. A higher color depth enables more colors and closer-to-life display effects.
4K p30/8bits is the entry-level UHD configuration typically used in bandwidth-constrained delivery paths; 4K p60/10bits is a mainstream UHD configuration typically used in high-bandwidth delivery channels or storage media; and 4K p120/12bits is a top-end configuration typically used in production studios or premium movies or sport games.
UHD necessitates more efficient and complex codecs to save bandwidth, e.g. H.265 (HEVC) or VP9. Whereas a FHD 1920×1080 p30/8bits video in H.264 requires 6–8 Mbps, a 4K 3840×2160 p30/8bits video in H.265 requires 12–15 Mbps, namely twice code rates to represent fourfold the pixels. The HEVC Main Profile uses 8 bits, and the Main 10 Profile uses 10-bit color space and requires higher bandwidth.
The advent of UHD not only enables large-screen TV without compromising pixel granularity, but also allows a viewer to sit very close to the TV screen and enjoy an extremely wide view angle, as immersive as in a giant IMAX theater.
HDR is critical to improve video quality. It expands the brightness/contrast range beyond the current color space and thus makes a picture/video looks more vivid, i.e. both very dark pixels and very bright pixels coexist in the same frame, just like how human eyes adapt to both dark and bright light conditions.
As we have learnt from photography experience, a photo tends to get overexposed under direct sunlight and lose details about dark shades, or get underexposed in dark and lose details in the brightness levels, so the HDR mode in a camera actually takes three photos consecutively: over-exposure, normal-exposure, and under-exposure, and then mix the dark and bright portions into the same photo so as to keep the details in both ends. The net effect of HDR photography is a more vivid picture than non-HDR mode, although some HDR photos may seem too perfect to be realistic.
HDR for video is more complex than HDR photography because video frames are captured so frequently (e.g. 30/60/120 fps), and each video frame needs to be captured in different exposure modes and then composed into an HDR frame via complex calculation.
It is generally agreed in the industry that HDR is more effective in improving viewer perception of video quality than UHD/4K for at least two reasons. First, whereas UHD/4K aims to present more pixels for more details, HDR makes each pixel more accurate and vivid, and better simulates human perception. Second, whereas human perception of pixel resolution decreases dramatically by distance, human perception of colors and brightness remain relatively constant by distance. The effect of UHD/4K vs. 1080p may become indiscernible at three meters or further, but the effect of HDR vs. non-HDR remains after three meters, i.e. within a typical home media room.
Although HDR has been around for 20 years, the emergence of HDR-capable TV sets is relatively new. Mainstream non-HDR TV sets and display monitors have a limited brightness of 300–500 nits. A typical HDR-capable TV set or computer monitor needs to deliver 700–1000 nits in brightness. A high-end HDR device, e.g. Dolby Vision TV, may reach 4000 nits.
HDR may be applied to both FHD (1080p) video and 4K videos. Various user tests have found that 1080p+HDR uses less bandwidth than 4K+non-HDR but delivers better viewing experience. Of course, 4K+HDR is the ideal combination. HDR mandates a 10-bit or 12-bit color space. Whereas a typical 4Kp30/8-bit video in HEVC requires 15 Mbps, a 4Kp30/10-bit video needs 18 Mbps or higher. For 12-bit color or higher frame rate (e.g. 60 or 120 fps), the required bandwidth has to increase accordingly.
Mankind has a perpetual desire for better entertainment. Immersive experience is a prime dimension in video evolution, with mushrooming new enablers like VR, AR, UHD, and HDR. Together with other evolutional dimensions such as artificial intelligence (AI), social network sharing (SNS), and big data, immersive experience is approaching and will transcend our real-life experience in the not-so-distant future.