Build Better AI Agents with MCP and OpenAI’s Agent SDK
BuildAIers Toolkit #7: Combine the power of Anthropic’s Model Context Protocol (MCP) with OpenAI’s Agent SDK to build smarter, tool-aware, and even voice-enabled AI agents.
TL;DR
Model Context Protocol (MCP) offers a way to extend large models with tools and resources
OpenAI Agent SDK is an agent framework that simplifies the process of building AI agents
By combining these two technologies, you can build better agents that are capable of much more, such as voice agents powered by MCP.
If you are building AI agents there are different ways you can go about building it you could agent toolkits such as LanGraph, smolagents, CrewAI, Autogen, Google ADK, and PydanticAI. Out of all these options one tool has risen to be the most beloved among AI builders and that is Anthropic’s Model Context Protocol (MCP).
While not being a toolkit it self MCP has revolutionized how we build AI agents. Thanks to MCP there’s a standard of how AI agents can access external tools. So one question comes to mind should you pick MCP over your current agent toolkit?
The answer to this question is No, MCP doesn’t seek to compete with your current AI agent stack rather it seeks to complete. You can easily integrate MCP into almost any agent toolkit. Infact most agent toolkits now support MCP.
For instance OpenAI Agent SDK supports MCP by utilizing the MCPServerStdio agent and MCPServerSse classes you can easily connect to MCP servers.
In this article we would see how we can build MCP powered agents by utilizing the OpenAI Agent SDK as our agent toolkit. Over the course of the article you will learn the following:
How to connect an agent to a local MCP server using the MCPServerStdio
How to connect an agent to a remote MCP server using the MCPServerSse
How to build voice agents with MCP and OpenAI Agent SDK
With that let’s get started 🚀
🛠️ Building Local Agents with MCPServerStdio
The OpenAI Agent SDK introduced support for MCP in version 0.0.7, and with it came the MCPServerStdio class. This class enables the Agent SDK to connect to local MCP servers, allowing agents to interact with local tools like file systems, databases, or other services.
💡 What is MCPServerStdio?
MCPServerStdio is used inside an asynchronous context manager. It takes in the command-line arguments required to launch an MCP-compatible server locally.
Here's the basic structure:
Python:
async with MCPServerStdio(
params={
"command": "cmd",
"args": ["arg1", "arg2"],
}
) as server:
passThis context manager yields an instance of the running MCP server, which you can assign to a variable (in this case, server) and use within the context block.
The params dictionary mimics a terminal command but breaks it down into:
"command": The command you would normally run (e.g.python, npx)."args": A list of arguments you'd pass to that command.
So, for the terminal command:
Unset:
python main.pyThe equivalent params would be:
Unset:
{
"command": "python",
"args": ["main.py"],
}🧠 Creating an Agent that Uses the MCP Server
Once we’ve set up the MCP server, we can define an agent that uses it:
Python:
agent = Agent(
name="Assistant",
instructions="You are a friendly AI assistant",
# tools = [tool_1, tool_2]
mcp_servers=[server],
)In a typical Agent SDK setup, you pass tools to the agent via the tools argument. But when using MCP, you use the mcp_servers argument instead. This allows your agent to interface with one or more MCP servers.
Let’s put both parts together:
Python:
async with MCPServerStdio(
params={
"command": "cmd",
"args": ["arg1", "arg2"],
}
) as server:
agent = Agent(
name="Assistant",
instructions="You are a friendly AI assistant",
mcp_servers=[server],
)Now the agent has access to the MCP server and can use it like any other tool.
🗂️ FileSystem AI Agent with MCP
Let’s build a simple AI agent that interacts with the local filesystem, able to browse directories, read/write files, and more.
To do this, we’ll use the MCP filesystem server, which is already available as an npm package.
🧱 Boilerplate Setup
Here’s the basic scaffold for our program:
Python:
import asyncio
import os
from agents import Agent, Runner
from agents.mcp import MCPServer, MCPServerStdio
async def run(mcp_server: MCPServer):
pass
async def main():
pass
if __name__ == "__main__":
asyncio.run(main())We define two functions:
run: Contains the agent logic and runs the agent with the provided MCP server.main: The entry point of the script. It will initialize the MCP server and pass it intorun().
🔌 Starting the FileSystem MCP Server
To launch the MCP filesystem server, we'll use npx:
Unset:
npx -y @modelcontextprotocol/server-filesystem sample_folderThe final argument is the folder the agent is allowed to access. For safety and clarity, restrict access to a specific folder.
Here's the updated main function:
Python:
async def main():
current_dir = os.path.dirname(os.path.abspath(__file__))
samples_dir = os.path.join(current_dir, "sample_folder")
async with MCPServerStdio(
name="Filesystem Server, via npx",
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", samples_dir],
},
) as server:
await run(server)🤖 Implementing the run() Function
Python:
async def run(mcp_server: MCPServer):
agent = Agent(
name="Assistant",
instructions="Use the tools to read the filesystem and answer questions based on those files.",
mcp_servers=[mcp_server],
)
input_history = []
while True:
user_input = input("\nProvide your prompt (or type 'exit' to quit): ")
if user_input.lower() == "exit":
print("Exiting chat.")
break
input_history.append(
{
"role": "user",
"content": user_input
}
)
result = await Runner.run(starting_agent=agent, input=input_history)
print("Assistant:", result.final_output)
input_history = result.to_input_list()We maintain a chat history and continuously prompt the user for input in a REPL loop. The Runner.run() method feeds the entire conversation to the agent and returns a result we can print to the terminal.
▶️ Running the Program
Ensure you have
npxinstalled (comes with Node.js).Create a
sample_folderin the same directory as your script:
Unset:
mkdir sample_folderRun your Python script, and start chatting with your file-aware AI assistant!
You can find the complete source code in the GitHub repository.
⚡Building remote Agents with MCPServerSse
Now that we've seen how to connect an agent to a local MCP server using MCPServerStdio, let’s explore how to connect to a remote MCP server using MCPServerSse.
💡 What is MCPServerSSE?
MCPServerSse is similar to its standard output counterpart, but instead of launching a subprocess locally with command-line parameters, it connects to a remote MCP server via Server-Sent Events (SSE). This makes it ideal for distributed or microservice-based agent architectures.
Here's a simple example of how to connect to a remote MCP server hosted on localhost:
Python:
async with MCPServerSse(
name="SSE Python Server",
params={
"url": "http://localhost:8000/sse",
},
) as server:
pass⚙️ Understanding the Parameters
url: The HTTP endpoint where the MCP server exposes its/ssestream. This is where the SDK establishes a persistent connection to receive data from the server.
🛠️ Building an MCP Remote Server
The MCPServerSse acts as a client, so we need to build a server that speaks the MCP protocol via SSE. You can do this easily using the FastMCP class provided by the MCP SDK.
FastMCP offers a convenient method .sse_app() that transforms the MCP server into a Starlette ASGI app. You can then serve it using any ASGI server, such as Uvicorn.
Here’s a minimal example:
Python:
from starlette.applications import Starlette
from starlette.routing import Mount
from mcp.server.fastmcp import FastMCP
import uvicorn
# Create an MCP server
mcp = FastMCP("Tutorial")
# Mount the SSE server to the existing ASGI server
app = Starlette(
routes=[
Mount('/', app=mcp.sse_app()),
]
)
if __name__ == '__main__':
uvicorn.run('server:app', port=8000)This exposes two key endpoints:
/sse– for streaming events to the Agent SDK./messages/– a POST endpoint the SDK uses to send messages.
🧰 Adding a Tool to the Server
You can register tools on your server using the @mcp.tool() decorator. Let’s enhance our server with a utility that returns the current time in a given timezone:
Python:
from starlette.applications import Starlette
from starlette.routing import Mount
from mcp.server.fastmcp import FastMCP
from datetime import datetime
from zoneinfo import ZoneInfo
import uvicorn
# Create an MCP server
mcp = FastMCP("Tutorial")
# Define tools
@mcp.tool()
def get_time_in_timezone(timezone: str) -> str:
"""
Given a timezone this tool returns the current time in that time zone
"""
try:
tz = ZoneInfo(timezone)
current_time = datetime.now(tz)
return current_time.strftime("%Y-%m-%d %H:%M:%S %Z%z")
except ValueError:
return "Invalid time zone. Please provide a valid time zone name."
# Mount the SSE server to the existing ASGI server
app = Starlette(
routes=[
Mount('/', app=mcp.sse_app()),
]
)
if __name__ == '__main__':
uvicorn.run('server:app', port=8000)Now when the agent queries this tool, the server will return the appropriate response.
🧠 Building the Agent
Your agent code stays almost the same—just switch to using MCPServerSse instead of MCPServerStdio.
Python:
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServer, MCPServerSse
async def run(mcp_server: MCPServer):
agent = Agent(
name="Assistant",
instructions="Use the tools to read the filesystem and answer questions based on those files.",
mcp_servers=[mcp_server],
)
input_history = []
while True:
user_input = input("\nProvide your prompt (or type 'exit' to quit):")
if user_input.lower() == "exit":
print("Exiting chat.")
break
input_history.append(
{
"role": "user",
"content": user_input
}
)
result = await Runner.run(starting_agent=agent, input=input_history)
print("Assistant:", result.final_output)
input_history = result.to_input_list()
async def main():
async with MCPServerSse(
name="SSE Python Server",
params={
"url": "http://localhost:8000/sse",
},
) as server:
await run(server)
if __name__ == "__main__":
asyncio.run(main())Make sure your server.py is running before launching the agent script (main.py).
You can find the complete source code in the GitHub repository.
🗣️ Combining Voice Agents with MCP
OpenAI added support for Voice Agents in version 0.0.6 of the SDK. This exciting addition allows agents to communicate with humans using natural speech. In this section, we’ll walk you through how to combine Voice Agents with MCP to unlock powerful workflows that can interact with files, APIs, and more using voice.
This guide focuses only on integrating MCP into a Voice Agent. If you’re new to Voice Agents, check out our guide: Building Voice Agents With OpenAI Agent SDK for a full walkthrough.
🧠 Voice Agent Anatomy: Workflow + Pipeline
A Voice Agent in the Agent SDK has two main components:
Workflow: Handles the core logic, like agent behavior.
Pipeline: Connects the workflow with a Text-to-Speech (TTS) model and a Speech-to-Text (STT) model, forming the full voice interaction loop.
Our MCP integration will live inside the workflow.
🧱 Building the Workflow with MCP
A standard text-based agent usually includes a run() method to process the agent’s interaction. In a voice-enabled agent with MCP, we extend this to also include create() and cleanup() methods.
Here's what our VoiceAgentWorkflow looks like:
Python:
class VoiceAgentWorkflow(VoiceWorkflowBase):
def __init__(self, agent: Agent, on_start, server: MCPServerStdio):
self._input_history: list[TResponseInputItem] = []
self._current_agent = agent
self._on_start = on_start
self._server = server
async def run(self, transcription: str):
pass
@classmethod
async def create(cls, on_start):
current_dir = os.path.dirname(os.path.abspath(__file__))
samples_dir = os.path.join(current_dir, "sample_folder")
server = MCPServerStdio(
name="Filesystem Server, via npx",
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", samples_dir],
},
)
await server.connect()
assistant_agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the Spanish agent."
),
mcp_servers=[server],
)
return cls(agent=assistant_agent, on_start=on_start, server=server)
async def cleanup(self):
await self._server.cleanup()🛠 create() Method
This is a class method that builds your workflow and connects the MCP server using MCPServerStdio. Unlike other examples, we avoid using a context manager (async with) so that we can maintain the connection throughout the lifetime of the app.
🧹 cleanup() Method
This ensures that the MCP server is gracefully cleaned up after the app exits.
Here’s a link to the full workflow code.
🎛️ Building the Voice Agent Pipeline
We’ll also need to implement a create method for the pipeline so it can instantiate the asynchronous workflow. Here's how:
Python:
class RealtimeCLIApp:
def __init__(self, workflow: VoiceAgentWorkflow):
self.should_send_audio = asyncio.Event()
self.pipeline = VoicePipeline(
workflow=workflow,
stt_model="gpt-4o-transcribe",
tts_model="gpt-4o-mini-tts",
)
self._audio_input = StreamedAudioInput()
self.audio_player = sd.OutputStream(
samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=FORMAT
)
@classmethod
async def create(cls) -> "RealtimeCLIApp":
workflow = await VoiceAgentWorkflow.create(on_start=cls._on_transcription_static)
return cls(workflow)
@staticmethod
def _on_transcription_static(transcription: str):
print(f"Transcription: {transcription}")
async def start_voice_pipeline(self):
pass
async def send_mic_audio(self):
pass
async def run(self):
pass
async def main():
app = await RealtimeCLIApp.create()
try:
await app.run()
finally:
await app.pipeline.workflow.cleanup()
if __name__ == "__main__":
asyncio.run(main())We call cleanup() at the end of main() to properly shut down the MCP server.
Here’s a link to the full pipeline code.
🚀 Running the Voice Agent
To run everything smoothly, install the required dependencies:
Unset:
pip install 'openai-agents[voice]'
pip install sounddeviceNow, place:
the workflow code in
workflow.pythe pipeline in
main.py
Run the app with:
Unset:
python main.pyYou’ll see a prompt in the terminal:
Unset:
🎙️ Press 'K' to start/stop recording, 'Q' to quit.🏁 Conclusion
The combination of MCP and the OpenAI Agent SDK unlocks a whole new world of possibilities for building powerful, tool-using agents. By tapping into the growing ecosystem of MCP servers, your agents can seamlessly interact with real-world tools and resources.
The OpenAI Agent SDK makes it incredibly simple to build these agents, handling the underlying complexities of memory, workflows, and context management so you can focus on what your agent should do, not how it works.
Take the time to explore using MCP plus Agent SDK so you can build better agents. 🚀



