In the 5G era, network architecture and service models become increasingly complex. The traditional O&M mode, relying on expert experience, suffers from low fault location efficiency and lengthy troubleshooting time, acting as a bottleneck for network O&M. Passive O&M, triggered by user complaints after service quality degradation or fault occurrence, severely affects user experience. To improve service quality in existing 5G SPN networks, boost user satisfaction, and drive automation, digitization, and intelligentization of network O&M, ZTE and China Mobile Liaoning (Liaoning Mobile for short) jointly proposed an intelligent closed-loop system solution (Fig. 1). Verification of this solution was conducted on ZTE’s intelligent management and control system ZENIC ONE (UME) in the existing network.
This solution extends alarm compression and root cause analysis capabilities based on intelligent rule-orchestrated fault diagnosis. It also deploys group fault analysis and automatic service quality maintenance (intent maintenance) functions, enabling intelligent closed-loop management. This includes real-time service status perception, automatic delimitation and location during fault analysis, and minute-level service recovery in certain scenarios. With this solution, SPN fault location accuracy can reach 95%, and overall O&M efficiency can increase by over 35%, as service quality problems are identified more efficiently and precise solutions are provided. By transitioning from manual and passive O&M to automatic and active O&M in certain scenarios, network maintenance efficiency and service security are enhanced, customer complaint rates are reduced, and customer satisfaction is improved.
Intelligent Rule-Orchestrated Fault Diagnosis
Traditional manual fault analysis relies heavily on the experience of professional personnel, involving extensive tasks such as alarm filtering, correlation analysis, tool preparation, and locating, which are time-consuming. It is challenging to pass on O&M experience and train new personnel. ZTE proposes an intelligent fault diagnosis solution that centralizes and modularizes distributed diagnosis tools such as ping, IOAM, RCA, and configuration check. In different service scenarios, maintenance personnel can independently orchestrate these modular diagnosis rules, adding fixed solutions to the library. During fault location, the system automatically selects and executes a diagnosis solution based on the fault type. Currently, the system can locate faults in various services, including connection/disconnection, packet loss, and clock faults. This feature allows mature diagnosis rules to be rapidly fixed, making it convenient for O&M personnel to call the rules at any time. It addresses the bottleneck of low efficiency in O&M knowledge transfer and the long accumulation cycles for maintenance experience. Diagnostic time is shortened from hours to minutes, greatly improving fault location efficiency.
Group Fault Analysis: Automatic Fault Location
A group fault refers to a quality abnormality affecting multiple services or network objects simultaneously due to the same fault. These faults mainly occur at the aggregation and core layers of the network and are typically identified through customer complaints. When a group fault occurs, it can result in a decline in the service quality of multiple users, or even service interruption. Traditional restoration methods, relying on manual analysis and location, are time-consuming, inefficient, and significantly affect customer satisfaction. ZTE’s group fault analysis tool integrates a closed-loop process encompassing service quality perception, fault commonality analysis, fault diagnosis, and fault restoration to rapidly locate group faults. Once a key service analysis task is initiated, the system monitors the quality of the service object in real time. When identifying abnormal service quality, the system comprehensively analyzes whether the fault is caused by a group fault based on current system alarm information. If identified as a group fault, the system starts a fault commonality analysis task to help O&M personnel quickly locate and resolve the fault. This functionality enables real-time network monitoring and group fault analysis within minutes, enhancing efficiency by over 90% and implementing proactive O&M in group fault scenarios.
Maintaining Service Intent: Supporting Committable SLA
The most direct factor affecting customer experience is the continual maintenance of SLA to meet user expectations, which is also a critical competitive advantage for operators. The service intent maintenance feature introduced by ZTE is aimed at fulfilling this important task.
ZENIC ONE (UME)’s service intent maintenance feature comprises three layers of closed-loop capabilities. The first layer is second-level service self-healing, where network-layer devices automatically trigger switching or rerouting of corresponding network objects upon identifying a service interruption, ensuring rapid resolution. The second layer, minute-level service restoration, kicks in when the network layer cannot implement second-level self-healing. The management and control system identifies service quality issues, locates and analyzes them, generates restoration policies, and automatically executes restoration commands within minutes. The third layer focuses on medium and long-term service optimization, predicting and analyzing service quality and traffic to preemptively address potential issues and optimize services.
At preset, the service intent maintenance function at the first two layers has been successfully implemented and verified on existing networks. This function facilitates self-perception and self-healing of service quality, reducing restoration time from hours to minutes compared to traditional manual O&M. It maintains services “permanently online”, simplifies O&M, and enhances customer experience while meeting SLA requirements.
Liaoning Mobile and ZTE have made significant investments in network O&M to tackle challenges and overcome bottlenecks. The intelligent closed-loop guarantee system for SPN service quality, which integrates intelligent fault diagnosis, group fault analysis, and service intent maintenance functions, enhances O&M efficiency for Liaoning Mobile, shifting from passive to active O&M. Looking ahead, both parties will extend their cooperation to reinforce fault root cause analysis, implement a “one fault, one worksheet” approach, and simulate restoration solutions, thereby further enhancing O&M efficiency, expanding application scenarios, and leveraging intelligent precise O&M to achieve breakthroughs in network O&M.