Open Responses Server — Architecture Overview
What It Does
Open Responses Server is a FastAPI proxy that translates OpenAI’s Responses API into Chat Completions API calls, allowing any OpenAI-compatible backend (Ollama, vLLM, LiteLLM, Groq, or OpenAI itself) to serve the Responses API. It also provides a /v1/chat/completions endpoint with automatic MCP tool injection and a tool-call execution loop, plus a generic proxy for all other endpoints.
Module Map
| Module | Purpose |
|---|---|
api_controller.py | FastAPI app, route definitions, CORS, startup/shutdown hooks, MCP tool injection for /responses |
responses_service.py | Responses-to-ChatCompletions request conversion, SSE stream processing, conversation history |
chat_completions_service.py | /v1/chat/completions handler with MCP tool injection and tool-call loop (streaming + non-streaming) |
common/config.py | All configuration via environment variables, logging setup |
common/llm_client.py | LLMClient singleton wrapping httpx.AsyncClient pointed at the backend |
common/mcp_manager.py | MCPManager singleton: MCP server lifecycle, tool discovery/caching, execution, result serialization |
models/responses_models.py | Pydantic models for all Responses API types and SSE streaming events |
server_entrypoint.py | Uvicorn entry point (imports app from api_controller) |
cli.py | otc CLI: start, configure, help commands |
version.py | __version__ string, read dynamically by setuptools |
Request Routing
Client
│
├─ POST /responses
│ → api_controller.create_response()
│ → MCP tools injected into request
│ → responses_service.convert_responses_to_chat_completions()
│ → LLM backend POST /v1/chat/completions (streaming)
│ → responses_service.process_chat_completions_stream()
│ → SSE events in Responses API format back to client
│
├─ POST /v1/chat/completions
│ → api_controller.chat_completions()
│ → chat_completions_service.handle_chat_completions()
│ → MCP tools injected into request
│ → Tool-call loop (up to MAX_TOOL_CALL_ITERATIONS)
│ → Final response streamed or returned as JSON
│
├─ GET /health → {"status": "ok", "adapter": "running"}
├─ GET / → {"message": "Open Responses Server is running."}
│
└─ GET/POST /{path} (catch-all proxy)
→ Forwarded to LLM backend at /v1/{path}
→ Response proxied back (streaming or non-streaming)
Configuration
All configuration is via environment variables, loaded from .env via python-dotenv. Defined in common/config.py.
| Variable | Default | Description |
|---|---|---|
OPENAI_BASE_URL_INTERNAL | http://localhost:8000 | Backend LLM API URL |
OPENAI_BASE_URL | http://localhost:8080 | This server’s external URL |
OPENAI_API_KEY | dummy-key | API key passed to backend |
API_ADAPTER_HOST | 0.0.0.0 | Server bind address |
API_ADAPTER_PORT | 8080 | Server port |
MCP_TOOL_REFRESH_INTERVAL | 10 | Seconds between MCP tool cache refreshes |
MCP_SERVERS_CONFIG_PATH | src/open_responses_server/servers_config.json | Path to MCP servers JSON config (use absolute path when pip-installed) |
MAX_CONVERSATION_HISTORY | 100 | Max stored conversation entries |
MAX_TOOL_CALL_ITERATIONS | 25 | Max tool-call loop iterations |
STREAM_TIMEOUT | 120.0 | HTTP timeout (seconds) for streaming requests |
HEARTBEAT_INTERVAL | 15.0 | SSE keepalive interval (seconds) |
LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
LOG_FILE_PATH | ./log/api_adapter.log | Path to log file |
MCP Server Configuration
The JSON file at MCP_SERVERS_CONFIG_PATH defines MCP servers. Three transport types are supported: stdio (default), sse, and streamable-http.
{
"mcpServers": {
"stdio-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
"env": {"KEY": "value"}
},
"sse-server": {
"type": "sse",
"url": "http://example.com/sse",
"headers": {"Authorization": "Bearer token"}
},
"http-server": {
"type": "streamable-http",
"url": "http://example.com/mcp",
"headers": {"Authorization": "Bearer token"}
}
}
}
The type field defaults to stdio if omitted. Stdio servers use command, args, and env fields. SSE and streamable-http servers use url and optional headers fields.
Startup / Shutdown Lifecycle
Startup
- LLM Client —
startup_llm_client()creates thehttpx.AsyncClientsingleton pointed atOPENAI_BASE_URL_INTERNAL - MCP Servers —
mcp_manager.startup_mcp_servers():- Loads
servers_config.json - Initializes each
MCPServervia stdio (subprocess) - Populates tool cache with initial tool discovery
- Builds
tool_name → server_namemapping for fast lookup - Starts background refresh task (periodic tool re-discovery)
- Loads
Shutdown
- LLM Client — closes
httpx.AsyncClient - MCP Servers — cancels refresh task, calls
cleanup()on each server
Conversation History
In-memory dictionary keyed by response_id, storing the full message list for multi-turn conversations. Loaded via previous_response_id in subsequent requests. Trimmed when exceeding MAX_CONVERSATION_HISTORY.
See Events & Tool Handling for details on save points and message validation.
Further Reading
- Events & Tool Handling — SSE event types, emission sequences, tool lifecycle
- API Flow Diagrams — Mermaid sequence diagrams for both endpoints
- Testing Guide — Running tests and writing new ones
- CLI Usage —
otccommands