Engineering
6 min read read

OpenAI Agents SDK: A Lightweight Python Framework for Multi‑Agent Workflows

Deep dive into OpenAI's newly released Agents SDK—a lightweight, production‑ready Python framework for orchestrating multi‑agent workflows with built‑in tool‑calling, memory management, and real‑time streaming.

Vijayaragupathy

AI Engineer

Published
April 22, 2026
OpenAI Agents SDK: A Lightweight Python Framework for Multi‑Agent Workflows

Introduction

In April 2026, OpenAI released the OpenAI Agents SDK (openai‑agents‑python), a lightweight yet powerful Python framework designed to simplify the development of multi‑agent workflows. Positioned as a production‑ready alternative to heavier orchestration frameworks, it provides built‑in support for tool‑calling, memory management, sandboxed execution, and real‑time streaming.

This post walks through the SDK's architecture, its core design patterns, and how it enables developers to build scalable, secure agentic systems. All insights are based on a direct analysis of the source code using Gemini CLI's @codebase‑investigator.

Core Architecture

The SDK is built around a Runner‑Agent‑Tool model that emphasizes simplicity and extensibility.

  • Runner: The central orchestrator that manages the execution loop, context window, and session state. It supports both synchronous (run) and streaming (run_streamed) execution.
  • Agent: A stateless function (or class) that receives a RunState and returns a RunStep. Agents can be composed, nested, and delegated using tool‑based handoffs.
  • Tool: A Python function decorated with @tool. The SDK automatically generates an OpenAPI‑compliant schema from the function's signature and docstring, enabling seamless integration with LLM tool‑calling.

Multi‑Agent Orchestration

The SDK's most distinctive feature is its tool‑based handoff pattern. Instead of hard‑coding agent‑to‑agent communication, you define tools that agents can call, and those tools can delegate work to other agents.

Example: Delegation via Tools

from openai.agents import tool, Runner
 
@tool
def research_topic(topic: str) -> str:
    """Research a topic and return a summary."""
    # This tool can internally call another agent
    research_agent = create_research_agent()
    result = Runner.run(research_agent, f"Research {topic}")
    return result.final_output
 
@tool
def write_blog_post(summary: str) -> str:
    """Write a blog post based on a research summary."""
    writer_agent = create_writer_agent()
    return Runner.run(writer_agent, f"Write a blog post about: {summary}").final_output
 
# Orchestrator agent can call both tools
orchestrator = create_orchestrator_agent(tools=[research_topic, write_blog_post])
result = Runner.run(orchestrator, "Write a blog post about quantum computing")

This pattern decouples agent logic, simplifies testing, and enables dynamic workflow composition.

Tool‑Calling & Schema Generation

The SDK eliminates the boilerplate of manually defining tool schemas. It uses Python's type hints and docstrings to generate accurate, LLM‑friendly schemas.

Code‑First Tool Definition

from openai.agents import tool
from pydantic import BaseModel
 
class SearchQuery(BaseModel):
    query: str
    max_results: int = 5
 
@tool
def web_search(query: SearchQuery) -> list[str]:
    """
    Perform a web search.
 
    Args:
        query: The search query with optional result limit.
    Returns:
        A list of search result snippets.
    """
    # Implementation...
    return ["Result 1", "Result 2"]

The SDK automatically:

  1. Converts the SearchQuery Pydantic model into a JSON Schema.
  2. Extracts the docstring for the tool description.
  3. Handles optional parameters with defaults.

Memory & Session Management

The SDK provides a session‑based memory system that persists conversation context across turns and supports pluggable storage backends.

Session Protocol

The Session protocol defines methods for loading and saving RunState. Implementations include:

  • InMemorySession: For single‑process, ephemeral conversations.
  • RedisSession: For distributed, multi‑instance deployments.
  • OpenAISession: Synchronizes local state with OpenAI's Responses API using conversation_id.
from openai.agents import Runner, InMemorySession
 
session = InMemorySession()
state = session.load()  # Loads existing RunState or creates new
 
# Run agent within the session
result = Runner.run(agent, "Hello!", state=state)
 
# Save updated state
session.save(state)

Context Window Management

The Runner automatically manages the LLM context window by:

  • Truncating older messages when the token limit is approached.
  • Preserving system prompts and critical metadata.
  • Supporting custom truncation strategies via the TruncationPolicy interface.

Sandboxing & Security

The SDK enforces security boundaries through explicit executors for local tools, forcing developers to think about isolation upfront.

Hosted Sandboxing

For code execution, the CodeInterpreterTool runs within OpenAI's managed, isolated containers, providing a safe environment for arbitrary Python code.

Local Sandboxing

Tools like LocalShellTool and ShellTool require a developer‑provided executor—a callable that defines the execution environment (e.g., a Docker container, a restricted shell, or a virtual machine).

from openai.agents.tools import LocalShellTool
import subprocess
 
def docker_executor(command: str) -> str:
    """Run command inside a Docker container."""
    result = subprocess.run(
        ["docker", "run", "--rm", "python:3.12", "sh", "-c", command],
        capture_output=True, text=True
    )
    return result.stdout
 
shell_tool = LocalShellTool(executor=docker_executor)

This “security by design” approach ensures that local tool execution is always sandboxed according to the developer's specifications.

Real‑Time & Streaming Capabilities

For low‑latency applications (voice, interactive chat), the SDK provides a RealtimeAgent and WebSocket‑based sessions.

WebSocket Pooling

The responses_websocket_session helper keeps WebSocket connections warm across multiple turns, significantly reducing the Time to First Token (TTFT).

from openai.agents import responses_websocket_session
 
async with responses_websocket_session() as session:
    result = session.run_streamed(agent, "Hello!")
    async for event in result:
        if event.type == "text_delta":
            print(event.text, end="")  # Stream tokens as they arrive

Event‑Driven Execution

The Runner.run_streamed() method yields a stream of semantic events:

  • reasoning_step: The agent's internal reasoning (if using a reasoning model).
  • tool_call: Invocation of a tool with arguments.
  • tool_result: The result of a tool call.
  • text_delta: Incremental text output.

This enables rich, real‑time UI updates and progressive rendering.

Summary of Design Patterns

ComponentDesign PatternBenefit
OrchestrationTool‑based HandoffsDecouples agent logic; simplifies delegation.
Tool ExecutionSchema‑from‑CodeReduces boilerplate; ensures type safety.
MemorySession ProtocolPluggable storage; context window management.
InteractivityEvent StreamingSupports low‑latency, real‑time UX.
SecurityExplicit ExecutorsForces intentional security boundaries for local tools.

Conclusion

The OpenAI Agents SDK represents a pragmatic, production‑oriented approach to multi‑agent development. By focusing on lightweight orchestration, automatic schema generation, and built‑in security boundaries, it lowers the barrier to building reliable agentic workflows.

What’s next? The SDK is still in early development (v0.14.0 as of April 2026), but its open‑source MIT license and modular design make it a compelling foundation for teams scaling AI agent deployments. For AI engineers, understanding these patterns is essential as we move from prototype to production.


This post was researched using Brave search and analyzed with Gemini CLI's @codebase‑investigator subagent, which cloned the openai/openai‑agents‑python repository and extracted the architectural insights and code examples shown above.

Continue Reading

More from the system

Engineering

Hugging Face ml‑intern: Automating LLM Post‑Training with an AI Agent

Deep dive into Hugging Face's ml‑intern—an open‑source AI agent that automates end‑to‑end LLM post‑training workflows, from literature review and data validation to fine‑tuning and deployment.

Engineering

Fine-Tuning LLMs in 2026: How LoRA and QLoRA Deliver 95% of Full-Tune Performance with 10,000x Fewer Parameters

Updated guide to parameter-efficient fine-tuning, covering recent advances in low-rank adaptation, quantization, multi-task adaptation, and hardware-aware optimizations that make customizing large models accessible on consumer hardware.

Engineering

Microsoft's Agent Governance Toolkit: Runtime Security for Autonomous AI Agents

Deep dive into Microsoft's open‑source Agent Governance Toolkit—a hypervisor‑based framework that brings deterministic policy enforcement, zero‑trust identity, and execution sandboxing to autonomous AI agents.