OpenAI Agents SDK: A Lightweight Python Framework for Multi‑Agent Workflows
Deep dive into OpenAI's newly released Agents SDK—a lightweight, production‑ready Python framework for orchestrating multi‑agent workflows with built‑in tool‑calling, memory management, and real‑time streaming.
Vijayaragupathy
AI Engineer

Introduction
In April 2026, OpenAI released the OpenAI Agents SDK (openai‑agents‑python), a lightweight yet powerful Python framework designed to simplify the development of multi‑agent workflows. Positioned as a production‑ready alternative to heavier orchestration frameworks, it provides built‑in support for tool‑calling, memory management, sandboxed execution, and real‑time streaming.
This post walks through the SDK's architecture, its core design patterns, and how it enables developers to build scalable, secure agentic systems. All insights are based on a direct analysis of the source code using Gemini CLI's @codebase‑investigator.
Core Architecture
The SDK is built around a Runner‑Agent‑Tool model that emphasizes simplicity and extensibility.
- Runner: The central orchestrator that manages the execution loop, context window, and session state. It supports both synchronous (
run) and streaming (run_streamed) execution. - Agent: A stateless function (or class) that receives a
RunStateand returns aRunStep. Agents can be composed, nested, and delegated using tool‑based handoffs. - Tool: A Python function decorated with
@tool. The SDK automatically generates an OpenAPI‑compliant schema from the function's signature and docstring, enabling seamless integration with LLM tool‑calling.
Multi‑Agent Orchestration
The SDK's most distinctive feature is its tool‑based handoff pattern. Instead of hard‑coding agent‑to‑agent communication, you define tools that agents can call, and those tools can delegate work to other agents.
Example: Delegation via Tools
from openai.agents import tool, Runner
@tool
def research_topic(topic: str) -> str:
"""Research a topic and return a summary."""
# This tool can internally call another agent
research_agent = create_research_agent()
result = Runner.run(research_agent, f"Research {topic}")
return result.final_output
@tool
def write_blog_post(summary: str) -> str:
"""Write a blog post based on a research summary."""
writer_agent = create_writer_agent()
return Runner.run(writer_agent, f"Write a blog post about: {summary}").final_output
# Orchestrator agent can call both tools
orchestrator = create_orchestrator_agent(tools=[research_topic, write_blog_post])
result = Runner.run(orchestrator, "Write a blog post about quantum computing")This pattern decouples agent logic, simplifies testing, and enables dynamic workflow composition.
Tool‑Calling & Schema Generation
The SDK eliminates the boilerplate of manually defining tool schemas. It uses Python's type hints and docstrings to generate accurate, LLM‑friendly schemas.
Code‑First Tool Definition
from openai.agents import tool
from pydantic import BaseModel
class SearchQuery(BaseModel):
query: str
max_results: int = 5
@tool
def web_search(query: SearchQuery) -> list[str]:
"""
Perform a web search.
Args:
query: The search query with optional result limit.
Returns:
A list of search result snippets.
"""
# Implementation...
return ["Result 1", "Result 2"]The SDK automatically:
- Converts the
SearchQueryPydantic model into a JSON Schema. - Extracts the docstring for the tool description.
- Handles optional parameters with defaults.
Memory & Session Management
The SDK provides a session‑based memory system that persists conversation context across turns and supports pluggable storage backends.
Session Protocol
The Session protocol defines methods for loading and saving RunState. Implementations include:
InMemorySession: For single‑process, ephemeral conversations.RedisSession: For distributed, multi‑instance deployments.OpenAISession: Synchronizes local state with OpenAI's Responses API usingconversation_id.
from openai.agents import Runner, InMemorySession
session = InMemorySession()
state = session.load() # Loads existing RunState or creates new
# Run agent within the session
result = Runner.run(agent, "Hello!", state=state)
# Save updated state
session.save(state)Context Window Management
The Runner automatically manages the LLM context window by:
- Truncating older messages when the token limit is approached.
- Preserving system prompts and critical metadata.
- Supporting custom truncation strategies via the
TruncationPolicyinterface.
Sandboxing & Security
The SDK enforces security boundaries through explicit executors for local tools, forcing developers to think about isolation upfront.
Hosted Sandboxing
For code execution, the CodeInterpreterTool runs within OpenAI's managed, isolated containers, providing a safe environment for arbitrary Python code.
Local Sandboxing
Tools like LocalShellTool and ShellTool require a developer‑provided executor—a callable that defines the execution environment (e.g., a Docker container, a restricted shell, or a virtual machine).
from openai.agents.tools import LocalShellTool
import subprocess
def docker_executor(command: str) -> str:
"""Run command inside a Docker container."""
result = subprocess.run(
["docker", "run", "--rm", "python:3.12", "sh", "-c", command],
capture_output=True, text=True
)
return result.stdout
shell_tool = LocalShellTool(executor=docker_executor)This “security by design” approach ensures that local tool execution is always sandboxed according to the developer's specifications.
Real‑Time & Streaming Capabilities
For low‑latency applications (voice, interactive chat), the SDK provides a RealtimeAgent and WebSocket‑based sessions.
WebSocket Pooling
The responses_websocket_session helper keeps WebSocket connections warm across multiple turns, significantly reducing the Time to First Token (TTFT).
from openai.agents import responses_websocket_session
async with responses_websocket_session() as session:
result = session.run_streamed(agent, "Hello!")
async for event in result:
if event.type == "text_delta":
print(event.text, end="") # Stream tokens as they arriveEvent‑Driven Execution
The Runner.run_streamed() method yields a stream of semantic events:
reasoning_step: The agent's internal reasoning (if using a reasoning model).tool_call: Invocation of a tool with arguments.tool_result: The result of a tool call.text_delta: Incremental text output.
This enables rich, real‑time UI updates and progressive rendering.
Summary of Design Patterns
| Component | Design Pattern | Benefit |
|---|---|---|
| Orchestration | Tool‑based Handoffs | Decouples agent logic; simplifies delegation. |
| Tool Execution | Schema‑from‑Code | Reduces boilerplate; ensures type safety. |
| Memory | Session Protocol | Pluggable storage; context window management. |
| Interactivity | Event Streaming | Supports low‑latency, real‑time UX. |
| Security | Explicit Executors | Forces intentional security boundaries for local tools. |
Conclusion
The OpenAI Agents SDK represents a pragmatic, production‑oriented approach to multi‑agent development. By focusing on lightweight orchestration, automatic schema generation, and built‑in security boundaries, it lowers the barrier to building reliable agentic workflows.
What’s next? The SDK is still in early development (v0.14.0 as of April 2026), but its open‑source MIT license and modular design make it a compelling foundation for teams scaling AI agent deployments. For AI engineers, understanding these patterns is essential as we move from prototype to production.
This post was researched using Brave search and analyzed with Gemini CLI's @codebase‑investigator subagent, which cloned the openai/openai‑agents‑python repository and extracted the architectural insights and code examples shown above.
Continue Reading
More from the system
Engineering
Hugging Face ml‑intern: Automating LLM Post‑Training with an AI AgentDeep dive into Hugging Face's ml‑intern—an open‑source AI agent that automates end‑to‑end LLM post‑training workflows, from literature review and data validation to fine‑tuning and deployment.
Engineering
Fine-Tuning LLMs in 2026: How LoRA and QLoRA Deliver 95% of Full-Tune Performance with 10,000x Fewer ParametersUpdated guide to parameter-efficient fine-tuning, covering recent advances in low-rank adaptation, quantization, multi-task adaptation, and hardware-aware optimizations that make customizing large models accessible on consumer hardware.
Engineering
Microsoft's Agent Governance Toolkit: Runtime Security for Autonomous AI AgentsDeep dive into Microsoft's open‑source Agent Governance Toolkit—a hypervisor‑based framework that brings deterministic policy enforcement, zero‑trust identity, and execution sandboxing to autonomous AI agents.