Insights into AI Agent Technology in 2024

Release Date:2024-05-16 By Du Yongsheng, Gao Yanqin

Less than three months after the release of ChatGPT 3.5 in early 2023, the paper on large  language  model (LLM)-powered autonomous agents was published, igniting immediate interest in AI agent technology. GPTs or customized versions of ChatGPT were launched at the OpenAI Developer Conference in November 2023. At present, agents have become a mainstream product of AI models.

Compared to general-purpose AI model products, agents, whether role-playing or task-specific, are technically easier to control. They offer more accurate outputs and are easier for users to understand and adjust. We will conduct an agent review to guide our follow-up research and work direction.

Current Technologies and Principles of Agents

From its initial structure-based definition to its current multi-modal model, the concept of an agent has gone through the phases of tools, social agents, workflows, multi-agent cooperation, and multiple modalities. Currently, an agent is defined as a virtual role based on the AI model with the ability to learn, remember, perceive the environment, recall past experiences, plan target tasks and execute them to influence the environment.

Each part of the agent, as shown in Fig. 1, is described as follows:

  • Thinking and skills: The agent receives task objectives from users, plans tasks using the LLM, and maps sub-tasks to the corresponding skills. Skills include indirect instruction communication for other agents’ production tasks and instructions directly executed in the production environment. Task decomposition can occur after overall planning or through iterative decomposition.  
  • Environment and perception: The analysis and execution of a task depend on the context of its environment. To understand the environment, a kind of modeling mode is required first to convert real-world information into a machine-readable language. For example, Metaverse is an environment modeling mode for the physical world, while the digital twin solution in the communication industry is an environment modeling mode for the communication network.
  • Memory and learning: The agent learns from other agents and environment feedback through imitation or reinforcement learning, and stores these learning outcomes into memory, enabling them to resolve similar problems in the future. The ability to learn and adjust to changing environments plays a vital role in the self-evolution of agents.

 

Agent Value Analysis

First, the current agent’s value mainly relies on LLMs, essentially a conditional probability generation model. Utilizing different prompts, such as text generation, task disassembly, logical inference, and scenario understanding, LLMs generate different types of outputs. Based on the output capability of LLMs, agents build humanized roles to serve within the production field.  

Second, from a mainstream industry perspective, the value of agents is embodied by a composite expert team composed of both human and multiple virtual members. This setup massively improves the scope of work, enabling one person to do the work traditionally requiring multiple persons. The paradigm has shifted from using tools to orchestrating multiple agents, which then use tools to complete tasks. Compared with conventional tools, LLM-based agents can provide greater generality and flexibility in judgment and decision-making.

Experimental Results of ZTE

At present, ZTE has developed four types of agents based on its understanding of  of LLMs and the communications industry. These include assurance assistant, intelligent Q&A, fault assistant, and network observation assistant.

The assurance assistant, utilized in major activity support scenarios, has a high degree of automation. It replicates real workflows to a virtual space, where key support experts, assistants, and troubleshooting experts work together to automatically complete workflows. They communicate with people through summarization, reporting and risk assessment. This is a complex type of job agent, developed with the aim of achieving L5 fully autonomous network.

The other three agents are technically task agents:

  • Intelligent Q&A: This agent is to build a ToB-oriented knowledge base application based on the RAG+agent technology.
  • Fault assistant: Facilitated by the fault knowledge bases and APIs, this agent assists O&M personnel in quickly troubleshooting faults.
  • Network observation assistant: Utilizing both large and small models, multiple agents perform network analysis across various dimensions. They then send their findings to the general network insight agent, which summarizes them and outputs network observation results.

 

Agent Development Trends and Technology Breakdown  

At present, the mainstream types of agents in the academic community are consistent with the experimental results of ZTE. They are as follows:

  • Logic agent: This kind of agent generates languages and multi-modal outputs based on its understanding of input languages and multi-modal data.
  • Task agent: Designed for specific tasks, this agent breaks down plans and performs corresponding operations. It lacks long-term memory during the process.
  • Job agent: Oriented to abstract work responsibilities and overarching objectives, this agent perceives the environment, remembers process status, and generates sub-objectives to advance the work.  

 

From the perspective of development trends, self-evolving agents are also important, as they can self-learn.

Mainstream agent products are categorized according to their technical level, as shown in Table 1.

We conduct further paper scanning and research on the technologies mentioned in Table 1, and find the following:

  • Technology maturity analysis: Despite many papers on the technologies marked with underlines in the table, there is a lack of mature solution in industrial environments.  
  • Technical problems analysis: Environment modeling and self-learning technologies pose the most significant challenges to solve. Despite being put forward early on, there are no good real-world solutions in physical production. In addition, their association with AI models is weak, and advancements in AI models have little impact on these technologies.
  • Technical potential analysis: Self-adaptive organization, exploration, intelligent prompt words, memory, and dialogue have potential for further development, which may be crucial for creating the gap between agent levels in the short term.
  • Development trend analysis: Based on the above analysis, task agents involve only one less mature technology. Job agents involve five immature technologies, including one difficult technology, environment modeling. Key technologies involved in self-evolving agents are all difficult. Therefore, task agents may experience the fastest development speed and offer the highest value.
  • Analysis of current products: Major products both domestically and internationally focus on the task agent type.

 

Insights About Agent Trends

Based on the above analysis, we can draw the following conclusions:

  • At the current stage, the product focuses on simple task agents, which utilizes mature technologies and can be easily replicated and promoted. This aligns with our product experiments, and we anticipate a rapid increase in the number of such agents.
  • Under the conditions described above, the technologies that widen the gap between agents are memory and dialog.
  • Developing a powerful individual agent is difficult because it involves enhanced learning and environment modeling technologies. This is consistent with the input cost and outcomes observed in our experiments with environment modeling.

 

Leveraging AI models, a simple task agent can provide inspiring information for other agents. If a certain number of task agents can be reached, one of the two necessary conditions for the emergence of group intelligence can be met. Second, with the abstract summarization capability of AI models, an agent within a team can combine multiple highly correlated information fragments from different agents, fulfilling another necessary condition of group intelligence. Once these two necessary conditions are met, the phenomenon of group intelligence may start to emerge.

To sum up, through academic tracing, product experimentation, and technical decomposition of different types of agents, we have derived an insight: in the upcoming year, we expect a rapid increase in the number of ordinary agents, and the phenomenon of group intelligence may emerge before that of powerful agents.

Building on this insight, we need to further consider the following aspects:

  • Develop framework technologies with low learning costs and low technological thresholds to quickly generate agents; establish multi-agent collaboration and intelligent control centers to manage the emergence of group intelligence.
  • Track the key capabilities driving agent evolution, including environment modeling, memory, learning and design, and thoroughly explores the potential of memory.
  • Develop agent products related to enterprise data analysis and SOP workflows to bring benefits to enterprises.