Blog

AI Agents for Non Techies

2025 is the year of AI Agents, If you’d like to get a quick grasp of what are AI Agents are and how they are built, here is a simplified version of Google’s White paper on AI agents.

Three types of Super Intelligence outlined by Nick Bostrum

What are agents?

AI Agents can execute a specific task until they arrive at the desired output. They are equipped with multiple tools to better understand the task and make independent, accurate decisions without supervision or a human in the loop. 

Agents are similar to “A Genie” that executes on a command. In Disney’s Aladdin, you’d see that once the genie starts execution, it’s difficult to stop. In his 2014 Book, Super Intelligence, Nick Bostrum classifies Intelligent AI systems into three classes namely Oracle, Genie and a Sovereign. We did an infographic comparison of the difference between these systems.

What are AI agents made up of?

AI Agents are typically made up of three layers. A Model layer, An Orchestration Layer and the Tools layer. 

Google's White paper Agent Architecture
Image shows the basic layers of an AI Agent

The Model Layer:

Think of the model layer as the CPU, central command or the brain of the agent. The model layer comprises of a small or a large language model like Chat GPT. This language model can be general purpose, multi modal or a fine tuned transformer model that supports the decision making process of the agent.

The Orchestration layer:

The Orchestration layer has multiple components to it. In general it collects input, processes and understands the input and prepares the next step through logical reasoning [with the help of Cognitive architecture].

Step 1 : An agent collects input like how our sensory organs collect information.

Step 2 : The information collected is processed through an internal reasoning. Like how our brain measures the distance of an object by processing what our eyes see.

Step 3 : Once the information is processed and understood, “the next action” or “decision” the agent needs to do it formed inside the orchestration layer. This decision making process is done through logical reasoning which in turn is achieved through Cognitive architectures like ReAct, Chain-of thought, and Tree of Thoughts.

Step 4 : This logical reasoning continues in a loop until the agent reaches its desired goal or output. The complexity of the Orchestration layer depend’s on the purpose of the Agent.

The Tools layer:

The Tools layer complement the Model Layer with access to real world, contextualised information that helps the”Agent make the relevant decision”. Some common examples of the Tools layer are Extensions, Functions and Fata Stores.

Example: Let’s assume you are building an AI agent that will help your mom decide what dish to cook on a daily basis. You’ve chosen ChatGPT to be your Model layer, which is trained on a wide data set of recipes, cookbooks and cuisines. Your mom ask’s the agent to come up with a recipe for breakfast and it would respond like below,

Chat GPT response

While the recommendations are great, your mom would dismiss this because Chat GPT’s recommendations are generic and not based on the ingredients in your kitchen or a basic understanding of the inventory.

Let’s again assume that I have a spreadsheet that tracks the real-time quantity of ingredients, vegetables in the refrigerator and a list all the food recipe’s mom made for breakfast, lunch and dinner for the last 100 days.

By supplying Chat GPT with this spreadsheet, we help improve its menu recommendation accuracy. This is exactly what the Tools Layer help Agents achieve – Supply external information that helps the model understand the world better.

What are the Different Types of Tools?

The scope of this article covers Google’s Agentic models and hence the different tool types given here are specific to Google models.

Tools are primarily classified into Extensions, Functions and Data Stores.

Extensions : Extensions bridges the gap between an Agent and an API. It can be custom code that would parse user input and make an API call. Example : User want’s to book a flight to Bali, the agent uses Google Flights API to fetch information.

Functions : Functions are self contained modules of code that accomplish a specific task and can be reused when required. A model can have access to multiple functions that it can choose from based on the situation. Functions are executed on the client-side while Extensions are executed on the agents side, this helps retain control of data flow.

Functions have its own advantages over Extensions because of the following reasons,

  • API calls needs to query to another application layer, outside the agent’s tech stack.
  • Security is a concern and agents can face authentication restrictions when making API calls.
  • Data retrieved from API calls should be transformed, Functions help with this transformation.

Data Stores : Data Stores allow developers to provide additional data in its original format to an agent, eliminating the need for time-consuming data transformations, model retraining, or fine- tuning. RAG [ Retrieval Augmented Generation] is the most popular form of connecting external datasets with a foundational model.

The Data store usually converts information into Vector Database embeddings that is easier for the agent to extract information. Some common formats that a model can be trained on are,

  • Websites
  • Structured data in PDF, word docs, Spreadsheets etc
  • Unstructured Data in formats like HTML, PDF, TXT,etc.

Based on the skillset or specialisation of task an agent needs to execute on, the model performance can be enhanced with multiple learning techniques like In-Context Learning, Retrieval-based-in-context-learning and fine tuning based learning.

Conclusion: The Agents are coming!

The future of agents looks bright and they are precursor to AGI [Artificial General Intelligence]. We’ll see more Agents join the workforce going forward and continue to drive profitability and ROI for businesses. Salesforce has already done this with their Customer Service Agents.

While Agents are vulnerable to Prompt Injection and ripe with Data security/ privacy related question, the ground is getting steady around them and we are going to see a mass exodus from traditional software towards specific agents. The future is equally exciting and scary.

Categories

Popular Tag

Related Resources