From Generative AI to Autonomous Agents: LangChain v1.x Core, Part 1
Learn how to transition from simple LLM wrappers to autonomous AI agents in LangChain v1.x. Build a fully runnable ReAct tool-calling agent with Gemini, understand the reasoning loop, and see why agents outperform raw LLM calls for real-world tasks.

TL;DR: Transitioning from simple LLM wrappers to autonomous agents requires a mindset shift. This post sets up LangChain v1.x, explains the ReAct reasoning model, and builds a fully runnable tool-calling agent.
The Problem with Traditional AI Applications
When you call an LLM directly, the interaction is stateless and one-shot:
[ User Prompt ] ---> [ Gemini ] ---> [ Static Text Response ]This works fine for summarisation or Q&A. But real-world tasks are rarely that simple. Think about a request like "Check our inventory for SKU-99, and if stock is below 50 units, raise a procurement request". A single LLM call cannot do this — it has no way to query your inventory system, make decisions based on the result, and then trigger a downstream workflow.
That is exactly the gap agents fill.
An Autonomous Agent treats the LLM as a reasoning engine, not just a text generator. The model looks at the request, decides which tool to call, calls it, reads the result, and repeats this cycle until the task is complete.
+---> [ Reason: What do I need? ] ---+
| |
[ User Request ] -+---> [ Act: Call the right tool ] <-+--- Tool result fed back
| |
+---> [ Observe: Is the job done? ] --+
|
[ If yes: END ]This loop is called ReAct (Reasoning + Acting) and it is the foundation of almost every LangChain agent you will build.
Setting Up Your Environment
Before writing any code, set up a clean Python environment. This isolates dependencies and avoids version conflicts with other projects on your machine.
# Create and activate a virtual environment
python -m venv langchain-env
source langchain-env/bin/activate # On Windows: langchain-env\Scripts\activate
# Install the required packages
pip install langchain langchain-google-genai langchain-community python-dotenvNext, get a Gemini API key from Google AI Studio — it's free. Create a .env file in your project root:
# .env
GOOGLE_API_KEY=your_actual_api_key_hereWhy use a
.envfile instead of hardcoding the key? Hardcoding API keys directly in your script is a security risk — if you ever push that file to GitHub, your key is exposed. Thepython-dotenvlibrary loads environment variables from.envat runtime, keeping credentials out of your source code.
Choosing Your Model: init_chat_model vs Direct Instantiation
LangChain v1.x introduced init_chat_model() as a provider-agnostic wrapper. Instead of importing a different class for each model provider, you use one unified function:
# create a file: 01_model_basics.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage
load_dotenv()
# Both models are initialised the same way — just change the string
gemini_flash = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
gemini_pro = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
response = gemini_flash.invoke([HumanMessage(content="Explain what an AI agent is in two sentences.")])
print(response.content)Run it:
python 01_model_basics.pyWhy not use
ChatGoogleGenerativeAIdirectly? You can — and for quick scripts it is fine. Butinit_chat_model()shines in team environments. If your team decides to switch from Gemini to Claude or OpenAI, you change one string and zero downstream code. It also enables runtime model selection — useful for A/B testing models in production.
Understanding the Message Architecture
Agents do not just send strings back and forth. They communicate through a structured message history. Each message type tells the model something specific:
| Message Type | Role | Example Use |
|---|---|---|
SystemMessage | Sets model behaviour and constraints | "You are a strict data analyst. Never guess." |
HumanMessage | The user's input | "What is the stock level for SKU-99?" |
AIMessage | The model's response (may include tool call requests) | A text reply or a tool_calls payload |
ToolMessage | The result returned from a tool execution | "Stock: 42 units in Warehouse East" |
# create a file: 02_messages.py
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
# This is a manually constructed conversation history
# In a real agent, this list is built automatically as the agent runs
conversation = [
SystemMessage(content="You are a concise data analyst. Respond only with bullet points."),
HumanMessage(content="What is the difference between data replication and sharding?"),
]
# Load your model and invoke with the full history
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()
model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
response = model.invoke(conversation)
print(response.content)Run it:
python 02_messages.pyWhy does message order matter? The model processes the list top-to-bottom. The
SystemMessagemust come first — it primes the model's behaviour before any user input. Getting this order wrong produces inconsistent results. Also note: the model does not actually "remember" previous conversations — you must pass the full history in every call.
Streaming and Batching: Production Patterns
Streaming improves perceived performance. Instead of waiting for the full response (which can take 5–10 seconds for long outputs), tokens are printed as they are generated.
# create a file: 03_streaming.py
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()
model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
print("Streaming response:\n")
for chunk in model.stream("Write a 3-sentence explanation of how vector databases work."):
print(chunk.content, end="", flush=True)
print() # newline at the endRun it:
python 03_streaming.pyBatching runs multiple independent prompts concurrently instead of one after another. For applications that need to process many inputs, this is significantly faster:
# Append this to 03_streaming.py or create 03b_batching.py
prompts = [
"Explain vector embeddings in one sentence.",
"What is the role of an API Gateway?",
"How does connection pooling improve database performance?",
]
responses = model.batch(prompts)
for i, r in enumerate(responses):
print(f"\nQ{i+1}: {prompts[i]}")
print(f"A{i+1}: {r.content.strip()}")Tip — when to use each: Use streaming for user-facing chat interfaces where the user stares at a loading screen. Use batching for background processing pipelines — document analysis, report generation, classification jobs — where you have many independent inputs and care about total throughput.
Building Your First Runnable Agent
Now let's put it all together. We will build a real agent with a custom tool, a ReAct reasoning loop, and verbose output so you can watch every decision it makes.
# create a file: 04_agent.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
load_dotenv()
# Step 1: Define a tool — a normal Python function decorated with @tool
# The docstring is critical: the model reads it to decide WHEN to call this tool
@tool
def check_stock_inventory(sku_id: str) -> str:
"""
Retrieves current stock levels and warehouse locations for a given SKU ID.
Use this tool when the user asks about product availability, stock counts, or fulfillment.
"""
sku = sku_id.strip().upper()
if "SKU-99" in sku:
return "Warehouse East: 42 units. Warehouse West: Out of stock."
elif "SKU-10" in sku:
return "Warehouse Central: 500 units. Overnight shipping available."
else:
return "SKU not found in catalog. Request forwarded to distribution team."
# Step 2: Load the standard ReAct prompt from LangChain Hub
# This prompt teaches the model HOW to reason, act, and observe in a loop
react_prompt = hub.pull("hwchase17/react")
# Step 3: Load the model and assemble the agent
model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
agent_tools = [check_stock_inventory]
agent = create_react_agent(model, agent_tools, react_prompt)
# Step 4: Wrap in an executor — this actually runs the ReAct loop
executor = AgentExecutor(agent=agent, tools=agent_tools, verbose=True)
# Step 5: Run it — verbose=True prints every Thought/Action/Observation step
result = executor.invoke({"input": "Can we fulfil an order for 20 units of SKU-99 right now?"})
print("\n--- Final Answer ---")
print(result["output"])Run it:
python 04_agent.pyYou will see the agent's full thought process printed step by step:
Thought: I need to check the inventory for SKU-99.
Action: check_stock_inventory
Action Input: SKU-99
Observation: Warehouse East: 42 units. Warehouse West: Out of stock.
Thought: 42 units is more than 20. The order can be fulfilled.
Final Answer: Yes, we can fulfil an order for 20 units from Warehouse East...Why pull the ReAct prompt from LangChain Hub instead of writing our own? The Hub prompt was engineered and battle-tested across thousands of models. Writing your own ReAct prompt from scratch risks subtle issues — models getting stuck in loops, not formatting
Action:andObservation:correctly, or skipping the thought step. Use the Hub prompt until you have a specific reason to customise it.
Tip — docstrings are the agent's instruction manual: The model decides which tool to call based entirely on the tool's docstring. A vague docstring like
"Gets stock data"leads to unpredictable tool selection. Write precise, behaviour-driven descriptions — tell the model when to use the tool, not just what it does.
What You Built
In this post you went from zero to a functioning autonomous agent:
- Environment setup with isolated dependencies and secure API key handling
- Model initialisation with provider-agnostic
init_chat_model() - Message architecture — why the order and type of messages matters
- Streaming and batching — two production performance patterns
- ReAct agent — a reasoning loop that selects and calls tools based on user intent
In Part 2, we move beyond single-loop agents and build stateful, graph-based workflows with LangGraph — giving you deterministic control over exactly how and when each step executes.
FAQs
Q: What is the primary difference between a simple LLM query and an autonomous agent?
A: A simple LLM query is a direct, stateless interaction where the model generates text based on prompt tokens. An autonomous agent uses the LLM as a reasoning engine to inspect user intent, dynamically select external tools, process the tool outcomes, and loop until it achieves the user's objective.
Q: Why use init_chat_model() instead of instantiating specific models directly?
A: init_chat_model() acts as a provider-agnostic factory. It allows you to switch between model providers (like Google Gemini, Anthropic Claude, or OpenAI GPT) using configuration variables rather than refactoring provider-specific import statements and client calls.
Q: Why are tool docstrings so important for agent accuracy?
A: Autonomous agents identify which tool to call based on the names and docstrings passed in the LLM's system instructions. If a tool has a generic or missing docstring, the model will fail to recognize when to invoke it, leading to hallucination or skipped execution loops.