Editor’s Note: ZTE CEO Xu Ziyang has delivered a keynote speech at the “AI First” session at MWC Shanghai 2024. He shared ZTE’s practices and innovations in intelligent digitalization amidst the AI wave.
Embracing Changes and Promoting Intelligent Evolution
Over the past year, large language models and generative AI have accelerated the transformation towards an increasingly digital and intelligent world. With the rapid emergence of new technologies and products, new business scenarios and models are also gaining momentum. Although generative AI is still in the early stage, there is a growing consensus that the world has already entered an AI-driven industrial revolution. AI will have disruptive and far-reaching impacts on all aspects of production and life, and significantly reshape the global economic landscape. According to forecasts by a consulting firm, by 2030, AI will boost China's GDP by 26% and North America's by 14.5%. This is equivalent to USD 10.7 trillion and accounts for almost 70% of the global economic impact. Apparently, AI will bring unprecedented business opportunities in various sectors such as retail, financial services, and healthcare.
Apart from issues concerning hallucinations, security, and ethics, the development of generative AI also faces challenges in terms of computing power, energy consumption, dataset, standardization, commercial application, etc. Therefore, advancements in multiple areas are required. As such, ZTE proposes three major principles: computing and network evolution, training and inference enhancement, openness and decoupling.
First, to break through technology bottlenecks, it’s crucial to strengthen research on architectures, algorithms, computing networks, and hardware-software synergy, thus improving AI training and inference efficiency. Second, various solutions such as retrieval-augmented generation (RAG) and AI agents should be employed to ensure reliability, security, and interpretability, thereby facilitating the widespread application of large models and higher value creation, and building a data flywheel that improves both capabilities and business efficiency. Finally, accelerating standardization through openness and decoupling will help build a thriving industrial and commercial ecosystem.
Building a Highly-Efficient Foundation Through Computing and Network Evolution
To begin with, we emphasize computing and network evolution to build a highly-efficient foundation. For intelligent computing, high-speed network connections are not just vital but also indispensable. From die-to-die (D2D) connectivity to interconnects of chips, servers, and data centers, continuous innovation and breakthroughs in network technology will greatly enhance the performance and efficiency of intelligent computing.
More specifically, die-to-die emphasizes high-speed interconnects between bare dies in a single package. Combined with the full series of in-house parallel and serial D2D interface IPs, as well as the advanced 2.5D and 3D packaging technologies, our solution enables heterogeneous integration and disaggregation. To a certain extent, challenges arising from the slowdown of Moore's Law and constraints in manufacturing can be effectively mitigated. We have developed chip architectures that enable heterogeneous computing and network processing, which in turn deliver enhanced performance and cost efficiency.
Chip-to-Chip focuses on interconnects across chips, which enables the development of a solution that integrates distributed high-speed interconnect and a full range of interfaces including PCIe 5/6 and 56G/112G/224G SerDes. This solution can effectively address the inflexibility and low bandwidth utilization of the mesh interconnect architecture. Furthermore, with the tensor parallelism degrees of up to 8/16, this solution better adapts to complex, large-scale intelligent computing scenarios, thus providing customers with differentiated competitive edges. By virtue of such a solution, ZTE has played an important role in a major operator’s advancement of the OISA architecture.
Furthermore, to meet the requirements of heterogeneous integration of photonic and electronic ICs for the next-gen 102.4T network switches, linear-drive pluggable optics (LPO) and co-packaged optics (CPO) can significantly enhance the interconnect density and reduce power consumption. In addition, optical I/O brings revolutionary improvement in bandwidth density, power efficiency, and latency.
Server-to-Server covers interconnects among intelligent computing clusters. In this regard, for example, ZTE has been fully collaborating with China Mobile to foster a robust industry ecosystem of global scheduling ethernet (GSE), aiming to build a non-blocking, high-bandwidth and ultra-low-latency network for the next-generation intelligent computing centers. In February, we participated in the GSE prototype interoperability test organized by China Mobile. In future scenarios involving thousands or tens of thousands of GPUs, ZTE will work with partners to promote intelligent innovation, contributing to the development of the industry chain. In terms of capability enhancement, ZTE will continue to improve the forwarding capacity of key chips from 12.8 Tbps to 51.2 Tbps. By offering diverse solutions ranging from single-layer, box+box, to chassis+box, we aim to better serve the needs of AI training in all scenarios.
DC-to-DC falls into the scope of wide-area connectivity. As we know, the 400G OTN lays a solid foundation for the intelligent interconnection of data centers. ZTE has made every effort to facilitate the commercial deployment of the world's largest 400G OTN, and collaborated with operators to demonstrate high-capacity connections of Real 400G. Together, we have completed the industry's first real-time Tbit-level transmission on a single fiber based on S+C+L wavelength bands, setting a world record for transmission distance. Going forward, ZTE will give full play to our strengths in connectivity to build efficient and all-optical networks for intelligent computing.
Empowering Various Sectors with Enhanced AI Training and Inference
For the application of large models across various sectors, in addition to the common issues concerning hallucinations, security, and ethics, it is also necessary to address a series of challenges in critical scenarios, such as expertise, accuracy, robustness, and traceability. Meanwhile, for the building of domain-specific models based on a foundation model, expertise in data governance, incremental training, and related engineering experiences and toolsets are also crucial. Take the telecom autonomous networks as an example, technological innovations such as the integration of large and small AI models, RAG, multi-agent collaboration, digital twins, and multimodal chain of thought (CoT) have all yielded promising results.
In the commercialization of AI application, training and inference are crucial to the leapfrog growth of the real economy. With massive application scenarios and proprietary data, China can make significant contributions to the global AI industry. To fully leverage this advantage, we must focus on enhancing accuracy, expertise, and inference efficiency, while also strengthening domain-specific data governance and the application of digital twin technology. Training boosts AI capabilities, while inference and application bring commercial value. During this circulation of mutual enhancement, a data flywheel is established. This will further accelerate the improvement and monetization of AI capabilities, which then turn into our core competence.
Then how to achieve that? We emphasize partnerships with high-value industries and angel customers. As the most influential players in an industry, angel customers are usually equipped with robust digital infrastructure. They actively embrace technological transformation and can lead the entire industry in the process of digital and intelligent evolution. By collaborating with these customers, we can not only integrate intelligent technologies with know-how, but also enable rapid validation and refinement of technical solutions to create exemplary use cases. Based on in-house or open-source foundation models, we can develop domain-specific large models with extensive industry data and knowledge engineering, thus making breakthroughs from 0 to 1. And these domain-specific models can be deployed from 1 to N by adapting to different application scenarios.
Openness and Decoupling for a Prosperous Ecosystem
Finally, it’s about embracing openness and decoupling to build a prosperous ecosystem. Despite the fast iteration of AI technologies, the current AI ecosystems remain closed, with industry-wide standards yet to be developed. This leads to several problems, such as resource waste caused by redundant development, risk concentration due to technology silos, and supply chain monopolies as a result of limited choices, all of which constrain the rapid and healthy development of AI.
Against such background, ZTE proposes a full-stack, open intelligent computing solution.
At the infrastructure level, hardware-software synergy maximizes resource efficiency. Specifically, the hardware is compatible with mainstream GPUs/CPUs in China and abroad, and supports open standards such as OSIA and RoCE/GSE for high-speed and lossless interconnection, offering customers a variety of choices. The software supports heterogeneous resource management, training and inference job scheduling, and heterogeneous collective communication. Compatible with GPUs from multiple manufacturers, the software enables a high-performance and reliable runtime environment for models. In addition, technologies such as computation offloading and in-network computing significantly reduce data read, write, and transmission time, thereby improving computing utilization rate.
In terms of capability platforms, the solution adapts to mainstream frameworks such as PyTorch and TensorFlow, enabling automatic backend compilation and optimization. Also, it provides an end-to-end engineering toolkit for data processing as well as the development, training, optimization, evaluation, and deployment of models. In addition to full lifecycle assurance and management, the solution also supports compute-native networking, heterogeneous training, efficient inference, and data flywheel building.
As for computing networks, the solution enhances computing and network synergy, facilitating seamless application migration across domains.
As a Chinese saying goes, "A single flower does not make spring, while one hundred flowers in full blossom bring spring to a garden." Similarly, a full-stack, open intelligent computing solution contributes to an open technology ecosystem and a win-win business ecosystem, which will in turn advance the healthy development of intelligent computing.
Through the decoupling of hardware and software, training and inference, and models, composable capabilities can be developed and widely shared, accelerating the innovation, R&D, application, and commercial use of AI technologies. All of these contribute to an open technology ecosystem.
Through collaboration between chip manufacturers, ICT hardware manufacturers, application developers, integrators, and operators within the industry, we can grow stronger together, building a win-win business ecosystem.
Digital Infrastructure and AI Transformation for a Better Future
Focusing on customer value, ZTE provides a full-stack and full-scenario intelligent computing solution involving computing power, networks, capabilities, intelligence, and applications. Multiple key technologies are also in place, such as high-speed interconnection, in-network computing, compute-native networking, seamless migration, data processing, and algorithm optimization. We focus on building efficient, green, and secure computing networks as digital infrastructure, and apply flexible, agile, and intelligent capabilities and applications for AI transformation. With the fully open composable R&D architecture of ZTE Digital Nebula, we can flexibly collaborate with customers by giving full play to our complementary strengths, thus empowering the digital transformation of industries.
In terms of digital infrastructure, ZTE provides a full series of computing, storage, network, and data center products and solutions, to fulfill various construction needs for intelligent computing centers from the core to the edge.
Regarding computing power, our chips are compatible with GPUs/CPUs from multiple manufacturers in China and abroad. Also, a series of products, including AI servers based on mounted modules/PCIe cards and integrated training and inference cabinets are developed to flexibly adapt to various scenarios. In addition, relying on energy-saving technologies such as hybrid cooling and scalable power distribution, we have developed a new intelligent computing center solution, with a PUE as low as 1.1 and a maximum cabinet power density of up to 60 kW.
In the realm of networking, ZTE, together with industry partners, has proposed an open GPU interconnect standard known as OpenLink (OLink), which will be fully integrated into the OSIA architecture of China Mobile. Compatible with the RDMA protocol for unified intra-node and inter-node communication, OLink focuses on promoting intra-node communication by shifting from mesh interconnects to switch-based interconnects. Such solution takes advantage of tensor parallelism for large-scale computing on a single node, reducing connection complexity, and improving cluster scale and efficiency. In addition, the self-developed RoCE NICs as well as box and chassis RDMA switches support the building of intelligent computing clusters that involve up to 100+ to 10,000+ GPUs. Also, the Real 400G solution helps build efficient and all-optical networks for intelligent computing.
For capability enhancement, we have developed various capability platforms. ZTE TECS, our unified cloud management platform, supports heterogeneous resource management, training and inference job scheduling, and heterogeneous collective communication. Meanwhile, ZTE AIS, a training and inference platform, can be applied to data processing and development of large models, providing full-stack engineering toolkits and engines. Taking seamless migration as an example, availability can be achieved within 5 days, and optimal performance within 15 days. Moreover, efficient inference enables the deployment of trillion-parameter models on a single GPU, and automated data labeling saves 80% of the time.
With regard to intelligence, ZTE Nebula Large Model focuses on algorithm innovation, data engineering, and efficient computing. The foundation model is available in various parameter sizes such as 2.5B, 16B, 40B, and 100B, and can be deployed in mobile, edge, and central cloud scenarios.
In the training phase, innovative technologies are applied, such as multi-stage pre-training, Chinese vocabulary improvement, high-quality corpus refinement, and synthetic data training. These innovations ensure effective model training while reducing the consumption of computing power by 50%.
As for inference efficiency, by quantizing weights to INT4 and KV cache to FP8, inference resources are saved threefold without compromising model accuracy.
Building on these advancements, the Nebula Telecom Large Model has gone through training based on mixture of experts (MoE) architecture with trillion parameters (9*20B). This model supports multimodal input and a context window of 120,000 tokens, providing expert-level insights and assistance for telecom business scenarios. At the same time, through multi-model collaboration, the Nebula R&D Model can be used in over 30 scenarios at different stages of the entire process, from requirement analysis, design, programming, to testing. Also, it can generate code in multiple programming languages, such as Python, Java, C/C++, Go, and JavaScript, achieving performance on par with GPT-4. Additionally, it significantly surpasses GPT-4 Turbo in terms of the accuracy and coverage of unit testing, and can directly generate test cases based on requirements (test-driven development).
Speaking of applications, ZTE is actively exploring the practical use of large models across industries. ZTE's large models have been applied in various fields such as R&D efficiency improvement, telecom network O&M, urban governance, and industrial parks. In particular, the Nebula Coder Model ranks among the top tier in terms of HumanEval scores. It currently has over 13,000 daily active users, handles more than 110,000 daily requests, and processes up to 330 million tokens per day. This model has improved coding efficiency by 30% and overall R&D efficiency by 10%. Meanwhile, through an end-to-end intelligent computing platform, ZTE provides customers with a whole-process large model toolchain, lowering the barriers to AI adoption and reducing development and usage costs. It can be said that ZTE Nebula Large Model truly brings intelligence to various industries.
To conclude, ZTE adheres to the principles of diversity, collaboration, and openness for win-win success, and firmly supports and promotes the prosperous development of industries. Meanwhile, we will double our efforts in intelligent computing, continuously leading innovations and development. Going forward, ZTE will continue to increase R&D investment and dedicate its efforts to achieving technological leadership in multiple fields. In this way, we hope to further promote the growth of the intelligent computing industry, contributing to economic prosperity.