Open Responses Server

A plug-and-play server that speaks OpenAI’s Responses API — no matter which AI backend you’re running.

Find this useful? Star the repo to follow updates and show support!

Install from PyPI — pip install open-responses-server and run otc start. See CLI Usage for all options.

Ollama, vLLM, LiteLLM, Groq, or even OpenAI itself — this server bridges them all to the OpenAI Responses API interface. It handles stateful chat, tool calls, and MCP server integration behind a familiar API.

What’s Inside

Architecture — Module map, request routing, configuration reference
Events & Tools — SSE event types, emission sequences, tool lifecycle
API Flow Diagrams — Mermaid sequence diagrams for both endpoints
Testing Guide — Running tests, writing tests, coverage
CLI Usage — otc commands and options
Extending — Web search and RAG extension guide
Security — Security scanning setup and policies
Using uv — Fast Python environment setup with uv
Publishing to PyPI — Release and publish workflow

Quick Start

Install

pip install open-responses-server

Or from source:

pip install uv
uv venv
uv pip install -e ".[dev]"

Configure

otc configure

Or set environment variables:

export OPENAI_BASE_URL_INTERNAL=http://localhost:11434  # Your LLM backend
export OPENAI_BASE_URL=http://localhost:8080             # This server
export OPENAI_API_KEY=sk-your-key

Run

otc start

Verify:

curl http://localhost:8080/v1/models

Key Features

Drop-in replacement for OpenAI’s Responses API
Works with any OpenAI-compatible backend
MCP server support for both Chat Completions and Responses APIs
Supports OpenAI’s Codex CLI and other Responses API clients
Stateful multi-turn conversations via in-memory history
Tool call execution loop with configurable iteration limits

About

Open Responses Server is an open-source project. It is not affiliated with or endorsed by OpenAI.

Licensed under MIT.