ZTE AI Full-Stack Intelligent Computing Solution: Empowering Various Industries

Release Date:2024-05-16 By Wang Weibin, Lu Guanghui

To date, AI has experienced three major shifts and two downturns. In November 2022, OpenAI released ChatGPT and its generative AI technology, employing transformer algorithms and pre-trained AI models. This propelled the third wave of AI development to unprecedented heights, marking a turning point and peak of excitement in AI models.

The generative AI technology can create new content, imitate human creativity and innovation, and play an important role in numerous fields, driving the prosperity and advancement of AI. Large-scale miracles are being achieved as bigger models bring greater intelligence. As AI technologies evolve, various industries will use AI to enhance operational efficiency, create business value, and move from a digital realm to a digital and intelligent world.

Faced with opportunities and challenges in generative AI technology, ZTE steadfastly maintains its role as a driver of the digital and intelligent economy, striving to be the ultimate AI company and serve as a leading AI enterprise model. Additionally, ZTE is committed to helping industries build end-to-end intelligent computing infrastructure and digital transformation solutions. Leveraging its universal computing solution, ZTE has launched the Nebula intelligent computing solution, guided by openness, efficiency, intelligence, and security concepts, and oriented towards training and inference scenarios. The solution spans intelligent infrastructure, AI platforms, AI models, and applications, empowering operators to build intelligent computing centers and drive digital intelligent transformation across industries (Fig. 1).

Intelligent Computing Infrastructure: Efficient and Secure

ZTE intelligent computing infrastructure layer, including IDCs, AI computing, integrated storage, lossless network, and resource management platform, supports the construction of diverse and multi-layered intelligent computing infrastructure. It ranges from AI model training & intelligent computing centers, hybrid training & intelligent computing centers, to edge training integrated computers. Each layer caters to specific performance, cost, and service needs in different scenarios. This multi-layer design enhances adaptability, flexibility, and user choice.

  • Efficiency Is the Priority

The cost of single AI model training is high, so an efficient intelligent computing infrastructure is required. ZTE builds such infrastructure through hardware, resource management, and product solutions.  

In hardware selection, processors with high computing power, large video memory, high-speed interconnection and high-performance concurrent multiple storage are chosen to improve system parallel operation rates, thereby boosting cluster computing power. Additionally, the independently developed DPU smart NIC provides a lossless network with ultra-large bandwidth and ultra-low latency, enhancing overall reliability and efficiency.

Regarding the resource management platform, multiple heterogeneous hardware devices are connected to meet efficient resource management needs for AI model training and inference. ZTE’s AI resource management platform, TECS, provides job scheduling and intelligent computing cluster management, including computing enhancement (such as vGPU technology), storage enhancement (such as supporting high-performance file storage), network enhancement (such as supporting integrated communication technology), and cluster management scheduling.

TECS is an enhanced product based on the original self-developed general computing resource management product, tailored for AI model training and inference requirements. It can function separately from the original product and integrate seamlessly to achieve unified management and orchestration of general and intelligent computing, or can be independently deployed to manage and orchestrate intelligent computing resources.

In product solution, ZTE has launched an all-in-one (AIO) out-of-the-box training machine to accurately meet the requirements of industry secondary training and real-time inference service scenarios, as shown in Fig. 2. This all-in-one machine integrates computing, storage, network devices, and AI platform software, supporting mainstream AI frameworks. It minimizes costs for training and inference of private domain models while lowering technical thresholds. This means that users do not need complex deployment and configuration procedures, and can be put into operation quickly, achieving flexible allocation of training and inference resources.

  • Security Is the Basis

Among the three basic elements of AI—computing power, algorithms, and data, computing power is the core element and primary driver for the comprehensive advancement and rapid application of AI systems. Therefore, it is critical to provide secure and reliable computing power. ZTE focuses on developing intelligent computing power, striving to establish multi-channel supply chains both domestically and internationally, specifically tailored to AI model training and inference scenarios. ZTE provides a complete set of mature solutions based on high-performance AI servers and IB switches sourced from leading GPU manufacturers worldwide. Additionally, through extensive collaboration with domestic head-end GPU manufacturers, ZTE has undertaken significant self-development efforts, providing diverse end-to-end intelligent computing solutions. These include high-performance AI servers utilizing chips from the leading GPU manufacturers, box-type and frame-type RoCE switches, and distributed storage servers supporting high-performance, multi-dimensional storage (such as files, objects, blocks, and big data).

In addition, AI model training with tens of billions of parameters is time-consuming due to large training data. To ensure stability and reliability, and avoid interruptions caused by hardware faults, ZTE’s TECS resource management platform offers a secure visual management tool for automatic monitoring. It also provides breakpoint training renewal service, minimizing interruption time and greatly reducing losses.

AI Platform: Openness and Decoupling

ZTE AI platform, centered on openness and decoupling, offers complete AI products. It provides a unified programming environment and tool chain, minimizing model development and migration costs while facilitating ecosystem construction.

To help developers and users better develop, train, evaluate, implement, and update AI models, ZTE provides a component-based AI platform—AI studio (AIS). Integrated into the intelligent AIO computing system or AI application, the AIS covers workflows such as data collection, data annotation, model training, model fine-tuning, knowledge base management, compilation optimization, and inference deployment. Supporting PyTorch and other mainstream AI frameworks, it delivers end-to-end intelligent computing center solutions, model capabilities and operational engines for AI applications.

AI Models and Applications: From Universal to Dedicated

ZTE has articulated its strategy for empowering enterprises’ digital transformation with large-scale models as "1+N+X", transitioning from "universal" to "dedicated".

  • “1” Base AI Model Series

Using its engineering capabilities, ZTE independently develops the Nebula base AI models series, including NLP and multi-mode models. Leveraging vast training data and using unsupervised or self-supervised learning methods, ZTE demonstrates exceptional understanding and expression capabilities in different tasks and domains.

  • “N” Domain Models

Based on the base AI model, the domain AI model improves professional capabilities through incremental pre-training on the domain Know-How. In the R&D field, since 2022, ZTE has utilized AI model technologies to enhance R&D efficiency, assisting R&D personnel in requirement analysis, product design, programming, testing, version release, and documentation. At present, ZTE’s coding AI models rank among the top assessed by HumanEval, boasting industry-leading capabilities in diverse coding languages and Chinese proficiency. In the telecom realm, ZTE leverages vast amounts of high-quality network O&M and service operation data to enrich AI telecom models, surpassing others in telecom knowledge. These AI models support multi-mode data in the communication field, addressing complex issues such as coverage, capacity, performance reports, and network presentation. Moreover, ZTE’s telecom AI models incorporate a robust intent engine integrated with autonomous networks, enhancing operators’ network operation efficiency through efficient workflow cascades.

  • “X” AI Scenario-Based Applications

ZTE has developed various subdivision applications based on domain AI models such as computer vision-based AI models. These include the urban lifeline solution for comprehensive urban security risk detection and early warning across critical infrastructure like water, power, gas, heat, and transportation. Additionally, ZTE offers a one-stop AI development assistant covering the entire R&D process, built upon the AI coding model. Utilizing the AI network model, ZTE has developed a range of O&M tools, such as fault O&M robots, tailored to support different scenarios. Moreover, ZTE has created the SMS anti-fraud service application based on the AI language model.

Rich Applications: Empowering Customers’ Digital Intelligent Transformation

To help operators and partners build end-to-end intelligent computing infrastructure and digital transformation solutions for enterprises, ZTE launches the open and decoupled Nebula intelligent computing solution. This solution provides AI full-stack products and is deployed across various sectors including intelligent computing centers, R&D efficiency improvement, communications, anti-fraud governance, and urban governance. In the communications sector, ZTE released the industry’s first AI-model-based SMS anti-fraud governance system in 2023. In industry domains, ZTE cooperates with numerous partners, signing strategic agreements and implementing multiple projects spanning machine vision, industrial production, and more. As a cornerstone technology of digital transformation, AI models play a pivotal role in the evolution and commercial prosperity of numerous industries in the new era. ZTE is fully poised to seize this significant opportunity with its partners, ensuring that AI can catalyze benefits across thousands of industries.