Open Responses Server
A plug-and-play server that speaks OpenAI’s Responses API — no matter which AI backend you’re running.
Find this useful? Star the repo to follow updates and show support!
Install from PyPI —
pip install open-responses-serverand runotc start. See CLI Usage for all options.
Ollama, vLLM, LiteLLM, Groq, or even OpenAI itself — this server bridges them all to the OpenAI Responses API interface. It handles stateful chat, tool calls, and MCP server integration behind a familiar API.
What’s Inside
- Architecture — Module map, request routing, configuration reference
- Events & Tools — SSE event types, emission sequences, tool lifecycle
- API Flow Diagrams — Mermaid sequence diagrams for both endpoints
- Testing Guide — Running tests, writing tests, coverage
- CLI Usage —
otccommands and options - Extending — Web search and RAG extension guide
- Security — Security scanning setup and policies
- Using uv — Fast Python environment setup with uv
- Publishing to PyPI — Release and publish workflow
Quick Start
Install
pip install open-responses-server
Or from source:
pip install uv
uv venv
uv pip install -e ".[dev]"
Configure
otc configure
Or set environment variables:
export OPENAI_BASE_URL_INTERNAL=http://localhost:11434 # Your LLM backend
export OPENAI_BASE_URL=http://localhost:8080 # This server
export OPENAI_API_KEY=sk-your-key
Run
otc start
Verify:
curl http://localhost:8080/v1/models
Key Features
- Drop-in replacement for OpenAI’s Responses API
- Works with any OpenAI-compatible backend
- MCP server support for both Chat Completions and Responses APIs
- Supports OpenAI’s Codex CLI and other Responses API clients
- Stateful multi-turn conversations via in-memory history
- Tool call execution loop with configurable iteration limits
About
Open Responses Server is an open-source project. It is not affiliated with or endorsed by OpenAI.
Licensed under MIT.