API Flow Diagrams
Sequence diagrams for the two main API endpoints. For detailed event type documentation, see Events & Tool Handling.
Plain Text /responses Flow (No Tools)
The simplest path — user sends a prompt, LLM responds with text.
sequenceDiagram
title /responses — Plain Text (No Tools)
participant Client
participant AC as api_controller
participant RS as responses_service
participant LLM
Client->>+AC: POST /responses (stream=true, prompt)
AC->>+RS: convert_responses_to_chat_completions(request)
RS-->>-AC: chat_request
AC->>+LLM: POST /v1/chat/completions (streaming)
LLM-->>-AC: HTTP 200 (SSE stream begins)
AC->>+RS: process_chat_completions_stream(response_stream)
RS-->>Client: response.created (ResponseCreated)
RS-->>Client: response.in_progress (ResponseInProgress)
loop LLM streams text
LLM-->>RS: chunk: delta.content
RS-->>Client: response.output_text.delta (OutputTextDelta)
end
LLM-->>RS: chunk: finish_reason="stop"
RS->>RS: Save conversation history
RS-->>-Client: response.completed (ResponseCompleted)
/responses Flow with MCP Tool Execution
Full flow when the LLM invokes a tool registered with the MCP manager.
sequenceDiagram
title /responses — MCP Tool Execution
participant Client
participant AC as api_controller
participant RS as responses_service
participant LLM
participant MCPM as mcp_manager
participant MCPS as MCPServer (Subprocess)
Client->>+AC: POST /responses (stream=true, prompt, tools)
rect rgb(230, 240, 255)
note over AC, RS: 1. Request Preparation
AC->>+MCPM: get_mcp_tools()
MCPM-->>-AC: Returns cached MCP tools
AC->>AC: Injects MCP tools into request's 'tools' list
AC->>+RS: convert_responses_to_chat_completions(request)
RS-->>-AC: Returns chat_request (OpenAI format)
end
rect rgb(230, 255, 230)
note over AC, LLM: 2. Call LLM API
AC->>+LLM: POST /v1/chat/completions (streaming)
LLM-->>-AC: HTTP 200 OK (SSE stream begins)
end
rect rgb(255, 255, 224)
note over RS, Client: 3. Stream Processing & Tool Call Generation
AC->>+RS: process_chat_completions_stream(response_stream)
RS-->>Client: response.created (ResponseCreated)
RS-->>Client: response.in_progress (ResponseInProgress)
loop LLM streams back tool call
LLM-->>RS: chunk: tool_calls delta (name)
RS->>RS: Detects new tool call, appends to response output
RS-->>Client: response.in_progress (ResponseInProgress — snapshot with tool call)
LLM-->>RS: chunk: tool_calls delta (arguments fragment)
RS-->>Client: response.function_call_arguments.delta (ToolCallArgumentsDelta)
end
LLM-->>RS: chunk: finish_reason="tool_calls"
end
rect rgb(255, 230, 224)
note over RS, MCPS: 4. MCP Tool Execution
RS->>RS: Loop through completed tool calls
RS->>+MCPM: is_mcp_tool(tool_name)?
MCPM-->>-RS: returns true
RS-->>Client: response.function_call_arguments.done (ToolCallArgumentsDone)
RS->>+MCPM: execute_mcp_tool(tool_name, args)
MCPM->>+MCPS: session.call_tool(tool_name, args)
MCPS-->>-MCPM: tool result
MCPM-->>-RS: tool result
end
rect rgb(224, 230, 255)
note over RS, Client: 5. Emitting MCP Result
RS->>RS: serialize_tool_result(result)
RS->>RS: Add 'function_call_output' to response output
RS-->>Client: response.output_text.delta (OutputTextDelta — serialized result)
end
rect rgb(240, 240, 240)
note over RS, Client: 6. Finalizing Response
RS->>RS: Save conversation history
RS-->>-Client: response.completed (ResponseCompleted)
end
/responses Flow with Non-MCP Tool (Client-Executed)
When the LLM calls a tool that is not registered with MCP, the server returns it to the client for execution.
sequenceDiagram
title /responses — Non-MCP Tool (Client Executes)
participant Client
participant RS as responses_service
RS-->>Client: response.in_progress (snapshot: output contains function_call with status="ready")
RS-->>Client: response.function_call_arguments.delta (repeated)
RS-->>Client: response.function_call_arguments.done
RS-->>Client: response.completed
Note over Client: Client executes tool externally
Client->>RS: POST /responses (input includes function_call_output with call_id)
Note over RS: Matches call_id to history, adds tool result, continues conversation
/v1/chat/completions Flow with MCP Tool Loop
The chat completions endpoint has its own tool-call loop. Streaming mode uses non-streaming requests internally during tool resolution, then streams only the final response.
sequenceDiagram
title /v1/chat/completions — MCP Tool Loop
participant Client
participant CCS as chat_completions_service
participant LLM
participant MCPM as mcp_manager
participant MCPS as MCPServer (Subprocess)
Client->>+CCS: POST /v1/chat/completions (messages, tools)
rect rgb(230, 240, 255)
note over CCS, MCPM: 1. MCP Tool Injection
CCS->>MCPM: get_mcp_tools()
MCPM-->>CCS: cached MCP tools
CCS->>CCS: Merge MCP tools with existing tools (de-duplicate)
end
rect rgb(230, 255, 230)
note over CCS, LLM: 2. Tool Call Loop (up to MAX_TOOL_CALL_ITERATIONS)
loop Until no tool_calls or max iterations
CCS->>+LLM: POST /v1/chat/completions (non-streaming)
LLM-->>-CCS: Response with finish_reason
alt finish_reason == "tool_calls"
CCS->>+MCPM: execute_mcp_tool(name, args) per tool
MCPM->>+MCPS: session.call_tool(name, args)
MCPS-->>-MCPM: tool result
MCPM-->>-CCS: serialized result
CCS->>CCS: Append tool results to messages, continue loop
else finish_reason != "tool_calls"
CCS->>CCS: Break loop
end
end
end
alt Streaming mode
rect rgb(255, 255, 224)
note over CCS, Client: 3. Stream Final Response
CCS->>+LLM: POST /v1/chat/completions (stream=true, full message history)
LLM-->>-Client: SSE stream (proxied directly)
end
else Non-streaming mode
rect rgb(255, 255, 224)
note over CCS, Client: 3. Return Final Response
CCS-->>-Client: JSON response
end
end