Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
With the introduction of ChatGPT, large-scale language models (LLMs) have become widely used in both technology and non-technology industries. This popularity is mainly due to two factors:
LLM as a storehouse of knowledge: LLM is trained on vast amounts of internet data and regularly updated (i.e. GPT-3, GPT-3.5, GPT-4, GPT-4o, etc.). Emergent capabilities: As an LLM grows, it develops capabilities not found in smaller models.
Does this mean we have already reached human-level intelligence, which we call artificial general intelligence (AGI)? Gartner defines AGI as the ability to understand and learn knowledge across a wide range of tasks and domains. , defines it as a type of AI that has the ability to apply The road to AGI is long, and one of the key hurdles is the autoregressive nature of LLM training, which predicts words based on past sequences. Yann LeCun, one of the pioneers of AI research, points out that LLM can move away from accurate responses due to its autoregressive nature. Therefore, LLM has some limitations.
Limited knowledge: Although LLMs are trained on vast amounts of data, they lack the latest knowledge of the world. Limited Reasoning: LLMs have limited reasoning abilities. As Subbarao Kambhampati points out, LLM is good for acquiring knowledge, but not for reasoning. No dynamism: LLM is static and does not have access to real-time information.
Overcoming the challenges of LLM requires a more sophisticated approach. This is where agents become important.
agent to the rescue
The concept of intelligent agents in AI has evolved over two decades, with implementations changing over time. Currently, agents are discussed in the context of LLM. Simply put, agents are like a Swiss Army knife for LLM assignments. Agents can assist in reasoning, provide a means to obtain up-to-date information from the Internet (using LLM to solve dynamic problems), and accomplish tasks autonomously. With LLM as the backbone, agents formally consist of tools, memory, reasoning (or planning), and action components.
AI agent components
Tools allow agents to access external information such as the Internet, databases, and APIs to collect the data they need. Memory can be short-term or long-term. Agents use scratchpad memory to temporarily retain results from various sources, and chat history is an example of long-term memory. Reasoner allows agents to think systematically and effectively break down complex tasks into manageable subtasks. Actions: Agents take actions based on their environment and reasoning, and iteratively adapt and solve tasks through feedback. ReAct is one popular method for iteratively performing inferences and actions.
What are agents good at?
Agents leverage LLM’s enhanced performance to excel at complex tasks, especially in role-playing mode. For example, when writing a blog, one agent focuses on research, another on writing, and each addresses a specific subgoal. This multi-agent approach is applied to numerous real-world problems.
Role-playing helps agents stay focused on a specific task to achieve a larger goal, and reduces hallucinations by clearly defining parts of the prompt, such as roles, instructions, and context. Since LLM performance depends on well-structured prompts, various frameworks formalize this process. One such framework, CrewAI, provides a structured approach to defining role-playing, as described next.
Multi-agent and single agent
Consider the example of search augmentation generation (RAG) using a single agent. This is an effective way to leverage information in indexed documents to enable LLM to process domain-specific queries. However, single-agent RAGs have their own limitations, such as search performance and document ranking. Multi-agent RAGs overcome these limitations by employing agents specialized in understanding, searching, and ranking documents.
In a multi-agent scenario, agents collaborate in various ways similar to distributed computing patterns: sequential message pools, centralized message pools, distributed message pools, or shared message pools. Frameworks such as CrewAI, Autogen, and langGraph+langChain enable multi-agent approaches to solving complex problems. In this article, we used CrewAI as a reference framework to explore autonomous workflow management.
Workflow Management: Examples of Using Multi-Agent Systems
Most industrial processes are about managing workflows, such as loan processing, marketing campaign management, and even DevOps. Achieving a specific goal requires successive or cyclical steps. Traditional approaches require humans to perform the tedious and mundane tasks of manually processing each application and validating it before proceeding to the next step at each step (such as validating a loan application).
Each step requires input from experts in the field. In a multi-agent setup with CrewAI, each step is handled by a crew of multiple agents. For example, in verifying a loan application, one agent verifies the user’s identity through a background check on documents such as a driver’s license, and another agent verifies the user’s financial details.
A question arises here. Can one staff member (multiple agents in sequence or hierarchy) handle all loan processing steps? While possible, this would complicate the crew, require extensive temporary memory, and reduce goal deviations. and increase the risk of hallucinations. A more effective approach is to treat each step in the loan process as a separate crew and view the entire workflow as a graph (using a tool like langGraph) of crew nodes operating sequentially or cyclically.
Since LLM is still in the early stages of intelligence, complete workflow management cannot be fully autonomous. Human involvement is required at critical stages of end-user validation. For example, after a crew member completes the verification step of a loan application, human oversight is required to verify the results. Over time, as trust in AI grows, some steps may become fully autonomous. Currently, AI-based workflow management acts as a supporting role, streamlining tedious tasks and reducing overall processing time.
production challenges
Deploying multi-agent solutions into production can present several challenges.
Scale: As the number of agents increases, collaboration and management becomes difficult. Various frameworks offer scalable solutions. For example, Llamaindex employs event-driven workflows to manage multi-agents at scale. Latency: Agent performance often introduces latency because tasks run repeatedly and require multiple LLM calls. Managed LLMs (such as GPT-4o) are slow due to implicit guardrails and network latency. Self-hosted LLM (with GPU control) can help solve latency issues. Performance and illusion issues: Due to the probabilistic nature of LLM, agent performance can vary from run to run. Techniques such as output templates (such as JSON format) and providing rich examples in prompts can help reduce response variability. The problem of hallucinations can be further reduced by training the agent.
final thoughts
As Andrew Ng points out, agents are the future of AI and will continue to evolve with the LLM. Multi-agent systems will advance in processing multimodal data (text, images, video, audio) to tackle increasingly complex tasks. While AGI and fully autonomous systems are still on the horizon, multi-agents will bridge the current gap between LLM and AGI.
Abhishek Gupta is Principal Data Scientist at Talentica Software.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
Why not consider contributing your own articles?
Read more about DataDecisionMakers