From Generative AI to Autonomous Agents: LangChain v...

TL;DR: Transitioning from simple LLM wrappers to autonomous agents requires a mindset shift. This post sets up LangChain v1.x, explains the ReAct reasoning model, and builds a fully runnable tool-calling agent.

The Problem with Traditional AI Applications

When you call an LLM directly, the interaction is stateless and one-shot:

text

[ User Prompt ] ---> [ Gemini ] ---> [ Static Text Response ]

This works fine for summarisation or Q&A. But real-world tasks are rarely that simple. Think about a request like "Check our inventory for SKU-99, and if stock is below 50 units, raise a procurement request". A single LLM call cannot do this — it has no way to query your inventory system, make decisions based on the result, and then trigger a downstream workflow.

That is exactly the gap agents fill.

An Autonomous Agent treats the LLM as a reasoning engine, not just a text generator. The model looks at the request, decides which tool to call, calls it, reads the result, and repeats this cycle until the task is complete.

text

+---> [ Reason: What do I need? ] ---+
                  |                                     |
[ User Request ] -+---> [ Act:  Call the right tool ] <-+--- Tool result fed back
                  |                                     |
                  +---> [ Observe: Is the job done? ] --+
                                    |
                              [ If yes: END ]

This loop is called ReAct (Reasoning + Acting) and it is the foundation of almost every LangChain agent you will build.

Setting Up Your Environment

Before writing any code, set up a clean Python environment. This isolates dependencies and avoids version conflicts with other projects on your machine.

bash

# Create and activate a virtual environment
python -m venv langchain-env
source langchain-env/bin/activate  # On Windows: langchain-env\Scripts\activate

# Install the required packages
pip install langchain langchain-google-genai langchain-community python-dotenv

Next, get a Gemini API key from Google AI Studio — it's free. Create a .env file in your project root:

bash

# .env
GOOGLE_API_KEY=your_actual_api_key_here

Why use a .env file instead of hardcoding the key? Hardcoding API keys directly in your script is a security risk — if you ever push that file to GitHub, your key is exposed. The python-dotenv library loads environment variables from .env at runtime, keeping credentials out of your source code.

Choosing Your Model: `init_chat_model` vs Direct Instantiation

LangChain v1.x introduced init_chat_model() as a provider-agnostic wrapper. Instead of importing a different class for each model provider, you use one unified function:

python

# create a file: 01_model_basics.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage

load_dotenv()

# Both models are initialised the same way — just change the string
gemini_flash = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
gemini_pro   = init_chat_model("gemini-3.5-flash",   model_provider="google-genai")

response = gemini_flash.invoke([HumanMessage(content="Explain what an AI agent is in two sentences.")])
print(response.content)

Run it:

bash

python 01_model_basics.py

Why not use ChatGoogleGenerativeAI directly? You can — and for quick scripts it is fine. But init_chat_model() shines in team environments. If your team decides to switch from Gemini to Claude or OpenAI, you change one string and zero downstream code. It also enables runtime model selection — useful for A/B testing models in production.

Understanding the Message Architecture

Agents do not just send strings back and forth. They communicate through a structured message history. Each message type tells the model something specific:

Message Type	Role	Example Use
`SystemMessage`	Sets model behaviour and constraints	`"You are a strict data analyst. Never guess."`
`HumanMessage`	The user's input	`"What is the stock level for SKU-99?"`
`AIMessage`	The model's response (may include tool call requests)	A text reply or a `tool_calls` payload
`ToolMessage`	The result returned from a tool execution	`"Stock: 42 units in Warehouse East"`

python

# create a file: 02_messages.py
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

# This is a manually constructed conversation history
# In a real agent, this list is built automatically as the agent runs
conversation = [
    SystemMessage(content="You are a concise data analyst. Respond only with bullet points."),
    HumanMessage(content="What is the difference between data replication and sharding?"),
]

# Load your model and invoke with the full history
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()

model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
response = model.invoke(conversation)
print(response.content)

Run it:

bash

python 02_messages.py

Why does message order matter? The model processes the list top-to-bottom. The SystemMessage must come first — it primes the model's behaviour before any user input. Getting this order wrong produces inconsistent results. Also note: the model does not actually "remember" previous conversations — you must pass the full history in every call.

Streaming and Batching: Production Patterns

Streaming improves perceived performance. Instead of waiting for the full response (which can take 5–10 seconds for long outputs), tokens are printed as they are generated.

python

# create a file: 03_streaming.py
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()

model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")

print("Streaming response:\n")
for chunk in model.stream("Write a 3-sentence explanation of how vector databases work."):
    print(chunk.content, end="", flush=True)
print()  # newline at the end

Run it:

bash

python 03_streaming.py

Batching runs multiple independent prompts concurrently instead of one after another. For applications that need to process many inputs, this is significantly faster:

python

# Append this to 03_streaming.py or create 03b_batching.py
prompts = [
    "Explain vector embeddings in one sentence.",
    "What is the role of an API Gateway?",
    "How does connection pooling improve database performance?",
]

responses = model.batch(prompts)
for i, r in enumerate(responses):
    print(f"\nQ{i+1}: {prompts[i]}")
    print(f"A{i+1}: {r.content.strip()}")

Tip — when to use each: Use streaming for user-facing chat interfaces where the user stares at a loading screen. Use batching for background processing pipelines — document analysis, report generation, classification jobs — where you have many independent inputs and care about total throughput.

Building Your First Runnable Agent

Now let's put it all together. We will build a real agent with a custom tool, a ReAct reasoning loop, and verbose output so you can watch every decision it makes.

python

# create a file: 04_agent.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

load_dotenv()

# Step 1: Define a tool — a normal Python function decorated with @tool
# The docstring is critical: the model reads it to decide WHEN to call this tool
@tool
def check_stock_inventory(sku_id: str) -> str:
    """
    Retrieves current stock levels and warehouse locations for a given SKU ID.
    Use this tool when the user asks about product availability, stock counts, or fulfillment.
    """
    sku = sku_id.strip().upper()
    if "SKU-99" in sku:
        return "Warehouse East: 42 units. Warehouse West: Out of stock."
    elif "SKU-10" in sku:
        return "Warehouse Central: 500 units. Overnight shipping available."
    else:
        return "SKU not found in catalog. Request forwarded to distribution team."

# Step 2: Load the standard ReAct prompt from LangChain Hub
# This prompt teaches the model HOW to reason, act, and observe in a loop
react_prompt = hub.pull("hwchase17/react")

# Step 3: Load the model and assemble the agent
model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
agent_tools = [check_stock_inventory]

agent = create_react_agent(model, agent_tools, react_prompt)

# Step 4: Wrap in an executor — this actually runs the ReAct loop
executor = AgentExecutor(agent=agent, tools=agent_tools, verbose=True)

# Step 5: Run it — verbose=True prints every Thought/Action/Observation step
result = executor.invoke({"input": "Can we fulfil an order for 20 units of SKU-99 right now?"})
print("\n--- Final Answer ---")
print(result["output"])

Run it:

bash

python 04_agent.py

You will see the agent's full thought process printed step by step:

text

Thought: I need to check the inventory for SKU-99.
Action: check_stock_inventory
Action Input: SKU-99
Observation: Warehouse East: 42 units. Warehouse West: Out of stock.
Thought: 42 units is more than 20. The order can be fulfilled.
Final Answer: Yes, we can fulfil an order for 20 units from Warehouse East...

Why pull the ReAct prompt from LangChain Hub instead of writing our own? The Hub prompt was engineered and battle-tested across thousands of models. Writing your own ReAct prompt from scratch risks subtle issues — models getting stuck in loops, not formatting Action: and Observation: correctly, or skipping the thought step. Use the Hub prompt until you have a specific reason to customise it.

Tip — docstrings are the agent's instruction manual: The model decides which tool to call based entirely on the tool's docstring. A vague docstring like "Gets stock data" leads to unpredictable tool selection. Write precise, behaviour-driven descriptions — tell the model when to use the tool, not just what it does.

What You Built

In this post you went from zero to a functioning autonomous agent:

Environment setup with isolated dependencies and secure API key handling
Model initialisation with provider-agnostic init_chat_model()
Message architecture — why the order and type of messages matters
Streaming and batching — two production performance patterns
ReAct agent — a reasoning loop that selects and calls tools based on user intent

In Part 2, we move beyond single-loop agents and build stateful, graph-based workflows with LangGraph — giving you deterministic control over exactly how and when each step executes.

FAQs

Q: What is the primary difference between a simple LLM query and an autonomous agent?
A: A simple LLM query is a direct, stateless interaction where the model generates text based on prompt tokens. An autonomous agent uses the LLM as a reasoning engine to inspect user intent, dynamically select external tools, process the tool outcomes, and loop until it achieves the user's objective.

Q: Why use init_chat_model() instead of instantiating specific models directly?
A: init_chat_model() acts as a provider-agnostic factory. It allows you to switch between model providers (like Google Gemini, Anthropic Claude, or OpenAI GPT) using configuration variables rather than refactoring provider-specific import statements and client calls.

Q: Why are tool docstrings so important for agent accuracy?
A: Autonomous agents identify which tool to call based on the names and docstrings passed in the LLM's system instructions. If a tool has a generic or missing docstring, the model will fail to recognize when to invoke it, leading to hallucination or skipped execution loops.

From Generative AI to Autonomous Agents: LangChain v1.x Core, Part 1

The Problem with Traditional AI Applications

Setting Up Your Environment

Choosing Your Model: init_chat_model vs Direct Instantiation

Understanding the Message Architecture

Streaming and Batching: Production Patterns

Building Your First Runnable Agent

What You Built

FAQs

Choosing Your Model: `init_chat_model` vs Direct Instantiation