Back to Blog
This post is Part 1 of 6 in the series: LangChain v1.x Core SeriesView Full Series

From Generative AI to Autonomous Agents (LangChain v1.x Core)

Understand the shift from static LLM calls to autonomous agents. Learn LangChain v1.x model patterns, message types, streaming, batching, and build your first runnable ReAct tool agent step by step.

Share Editorial
From Generative AI to Autonomous Agents (LangChain v1.x Core)

The Problem with Traditional AI Applications

When you call an LLM directly, the interaction is stateless and one-shot:

text
[ User Prompt ] ---> [ Gemini ] ---> [ Static Text Response ]

This works fine for summarisation or Q&A. But real-world tasks are rarely that simple. Think about a request like "Check our inventory for SKU-99, and if stock is below 50 units, raise a procurement request". A single LLM call cannot do this — it has no way to query your inventory system, make decisions based on the result, and then trigger a downstream workflow.

That is exactly the gap agents fill.

An Autonomous Agent treats the LLM as a reasoning engine, not just a text generator. The model looks at the request, decides which tool to call, calls it, reads the result, and repeats this cycle until the task is complete.

text
+---> [ Reason: What do I need? ] ---+
                  |                                     |
[ User Request ] -+---> [ Act:  Call the right tool ] <-+--- Tool result fed back
                  |                                     |
                  +---> [ Observe: Is the job done? ] --+
                                    |
                              [ If yes: END ]

This loop is called ReAct (Reasoning + Acting) and it is the foundation of almost every LangChain agent you will build.


Setting Up Your Environment

Before writing any code, set up a clean Python environment. This isolates dependencies and avoids version conflicts with other projects on your machine.

bash
# Create and activate a virtual environment
python -m venv langchain-env
source langchain-env/bin/activate  # On Windows: langchain-env\Scripts\activate

# Install the required packages
pip install langchain langchain-google-genai langchain-community python-dotenv

Next, get a Gemini API key from Google AI Studio — it's free. Create a .env file in your project root:

bash
# .env
GOOGLE_API_KEY=your_actual_api_key_here

Why use a .env file instead of hardcoding the key? Hardcoding API keys directly in your script is a security risk — if you ever push that file to GitHub, your key is exposed. The python-dotenv library loads environment variables from .env at runtime, keeping credentials out of your source code.


Choosing Your Model: init_chat_model vs Direct Instantiation

LangChain v1.x introduced init_chat_model() as a provider-agnostic wrapper. Instead of importing a different class for each model provider, you use one unified function:

python
# create a file: 01_model_basics.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage

load_dotenv()

# Both models are initialised the same way — just change the string
gemini_flash = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
gemini_pro   = init_chat_model("gemini-3.5-flash",   model_provider="google-genai")

response = gemini_flash.invoke([HumanMessage(content="Explain what an AI agent is in two sentences.")])
print(response.content)

Run it:

bash
python 01_model_basics.py

Why not use ChatGoogleGenerativeAI directly? You can — and for quick scripts it is fine. But init_chat_model() shines in team environments. If your team decides to switch from Gemini to Claude or OpenAI, you change one string and zero downstream code. It also enables runtime model selection — useful for A/B testing models in production.


Understanding the Message Architecture

Agents do not just send strings back and forth. They communicate through a structured message history. Each message type tells the model something specific:

Message TypeRoleExample Use
SystemMessageSets model behaviour and constraints"You are a strict data analyst. Never guess."
HumanMessageThe user's input"What is the stock level for SKU-99?"
AIMessageThe model's response (may include tool call requests)A text reply or a tool_calls payload
ToolMessageThe result returned from a tool execution"Stock: 42 units in Warehouse East"
python
# create a file: 02_messages.py
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

# This is a manually constructed conversation history
# In a real agent, this list is built automatically as the agent runs
conversation = [
    SystemMessage(content="You are a concise data analyst. Respond only with bullet points."),
    HumanMessage(content="What is the difference between data replication and sharding?"),
]

# Load your model and invoke with the full history
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()

model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
response = model.invoke(conversation)
print(response.content)

Run it:

bash
python 02_messages.py

Why does message order matter? The model processes the list top-to-bottom. The SystemMessage must come first — it primes the model's behaviour before any user input. Getting this order wrong produces inconsistent results. Also note: the model does not actually "remember" previous conversations — you must pass the full history in every call.


Streaming and Batching: Production Patterns

Streaming improves perceived performance. Instead of waiting for the full response (which can take 5–10 seconds for long outputs), tokens are printed as they are generated.

python
# create a file: 03_streaming.py
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()

model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")

print("Streaming response:\n")
for chunk in model.stream("Write a 3-sentence explanation of how vector databases work."):
    print(chunk.content, end="", flush=True)
print()  # newline at the end

Run it:

bash
python 03_streaming.py

Batching runs multiple independent prompts concurrently instead of one after another. For applications that need to process many inputs, this is significantly faster:

python
# Append this to 03_streaming.py or create 03b_batching.py
prompts = [
    "Explain vector embeddings in one sentence.",
    "What is the role of an API Gateway?",
    "How does connection pooling improve database performance?",
]

responses = model.batch(prompts)
for i, r in enumerate(responses):
    print(f"\nQ{i+1}: {prompts[i]}")
    print(f"A{i+1}: {r.content.strip()}")

Tip — when to use each: Use streaming for user-facing chat interfaces where the user stares at a loading screen. Use batching for background processing pipelines — document analysis, report generation, classification jobs — where you have many independent inputs and care about total throughput.


Building Your First Runnable Agent

Now let's put it all together. We will build a real agent with a custom tool, a ReAct reasoning loop, and verbose output so you can watch every decision it makes.

python
# create a file: 04_agent.py
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

load_dotenv()

# Step 1: Define a tool — a normal Python function decorated with @tool
# The docstring is critical: the model reads it to decide WHEN to call this tool
@tool
def check_stock_inventory(sku_id: str) -> str:
    """
    Retrieves current stock levels and warehouse locations for a given SKU ID.
    Use this tool when the user asks about product availability, stock counts, or fulfillment.
    """
    sku = sku_id.strip().upper()
    if "SKU-99" in sku:
        return "Warehouse East: 42 units. Warehouse West: Out of stock."
    elif "SKU-10" in sku:
        return "Warehouse Central: 500 units. Overnight shipping available."
    else:
        return "SKU not found in catalog. Request forwarded to distribution team."

# Step 2: Load the standard ReAct prompt from LangChain Hub
# This prompt teaches the model HOW to reason, act, and observe in a loop
react_prompt = hub.pull("hwchase17/react")

# Step 3: Load the model and assemble the agent
model = init_chat_model("gemini-3.5-flash", model_provider="google-genai")
agent_tools = [check_stock_inventory]

agent = create_react_agent(model, agent_tools, react_prompt)

# Step 4: Wrap in an executor — this actually runs the ReAct loop
executor = AgentExecutor(agent=agent, tools=agent_tools, verbose=True)

# Step 5: Run it — verbose=True prints every Thought/Action/Observation step
result = executor.invoke({"input": "Can we fulfil an order for 20 units of SKU-99 right now?"})
print("\n--- Final Answer ---")
print(result["output"])

Run it:

bash
python 04_agent.py

You will see the agent's full thought process printed step by step:

text
Thought: I need to check the inventory for SKU-99.
Action: check_stock_inventory
Action Input: SKU-99
Observation: Warehouse East: 42 units. Warehouse West: Out of stock.
Thought: 42 units is more than 20. The order can be fulfilled.
Final Answer: Yes, we can fulfil an order for 20 units from Warehouse East...

Why pull the ReAct prompt from LangChain Hub instead of writing our own? The Hub prompt was engineered and battle-tested across thousands of models. Writing your own ReAct prompt from scratch risks subtle issues — models getting stuck in loops, not formatting Action: and Observation: correctly, or skipping the thought step. Use the Hub prompt until you have a specific reason to customise it.

Tip — docstrings are the agent's instruction manual: The model decides which tool to call based entirely on the tool's docstring. A vague docstring like "Gets stock data" leads to unpredictable tool selection. Write precise, behaviour-driven descriptions — tell the model when to use the tool, not just what it does.


What You Built

In this post you went from zero to a functioning autonomous agent:

  • Environment setup with isolated dependencies and secure API key handling
  • Model initialisation with provider-agnostic init_chat_model()
  • Message architecture — why the order and type of messages matters
  • Streaming and batching — two production performance patterns
  • ReAct agent — a reasoning loop that selects and calls tools based on user intent

In Part 2, we move beyond single-loop agents and build stateful, graph-based workflows with LangGraph — giving you deterministic control over exactly how and when each step executes.

Sponsored Advertisement