phoson-engine-minimal
v0.1.0Open Source · MITMinimal Python runtime for the Phoson autonomous-agent platform. Framework-free ReAct loop built directly on provider SDKs — no LangChain, no LangGraph. Full control over streaming, tool execution, cost tracking, and session branching.
Why Phoson Engine?
| Traditional Frameworks | Phoson Engine |
|---|---|
| Heavy dependencies | Zero external agent frameworks |
| Linear conversations | Branchable conversation trees |
| Black-box streaming | Full typed event visibility |
| Fixed ReAct patterns | Custom ReAct loop |
| Enterprise pricing | MIT licensed & open source |
Installation
Clone the repository and install dependencies with uv.
git clone https://github.com/phoson-lat/phoson-engine-minimal.git
cd phoson-engine-minimal
# Install all dependencies (including dev)
uv sync --dev --locked
# Install git hooks (optional but recommended)
uv run pre-commit install --install-hooks
uv run pre-commit install --hook-type commit-msg
uv run pre-commit install --hook-type pre-pushRun checks locally
uv run ruff format --check .
uv run ruff check .
uv run python -m compileall phoson_llm phoson_agent phoson_cli
uv run pytest -qQuick Start
Run a minimal agent with a tool in under 20 lines.
from phoson_agent import AgentEngine, tool
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig
# 1. Define a tool with the @tool decorator
@tool
def get_weather(city: str, country: str = "MX") -> dict:
"""Returns current weather for a given city."""
return {"city": city, "condition": "sunny", "temperature_c": 27}
# 2. Create the engine
engine = AgentEngine(
chat=OpenAIChat(), # reads OPENAI_API_KEY from env
tools=[get_weather],
phoson_weight=1.2, # credits multiplier
)
# 3. Run synchronously
result = engine.run_sync(
messages=[Message(role="user", content="What's the weather in Querétaro?")],
config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)
print(result.final_content)
print(f"Cost: ${result.total_cost_usd:.6f} | Credits: {result.total_credits:.4f}")Interactive CLI
Launch the interactive REPL for live agent sessions with streaming and session branching.
# Start the interactive REPL
uv run phoson-cli
# Run the setup wizard (configure provider & model)
uv run phoson-cli --setupArchitecture
The engine is composed of three independent Python packages that can be used separately or together.
phoson_llmLLM normalization layer. Provider adapters + typed event stream + pricing.
phoson_agentReAct agent loop, tool execution, middleware hooks, plugin system, session trees.
phoson_cliInteractive REPL with streaming, branching, model switching, and session picker.
Runtime loop
The ReAct loop iterates up to max_iterations times. Each iteration calls the LLM, executes any tool calls, and feeds results back until the model produces a final answer.
Client → AgentEngine.run(messages, config)
└─ AgentLoop.run_iteration()
├─ LLM.stream() → TokenEvent / ToolCallEvent / UsageEvent / LLMDoneEvent
│
├─ [if ToolCallEvent]
│ └─ ToolRunner.execute(args) → result / error
│ └─ history += ToolResultBlock
│ └─ continue loop
│
└─ [if LLMDoneEvent, no tools]
└─ AgentDoneEvent → AgentRunResultphoson_llm
LLM normalization layer that wraps provider SDKs and returns a single typed event stream regardless of provider. All adapters extend BaseLLMChat.
Providers
| Provider | Class | Notes |
|---|---|---|
| Anthropic | AnthropicChat | Native thinking, tool use, prompt caching |
| OpenAI | OpenAIChat | Tool use, reasoning_effort for o1/o3 |
| OpenRouter | OpenAIChat(base_url=...) | 300+ models via OpenAI-compatible API |
| Ollama | OpenAIChat(base_url="http://localhost:11434/v1") | Local inference, api_key="ollama" |
Anthropic
AnthropicChat
Native thinking, tool use, prompt caching
OpenAI
OpenAIChat
Tool use, reasoning_effort for o1/o3
OpenRouter
OpenAIChat(base_url=...)
300+ models via OpenAI-compatible API
Ollama
OpenAIChat(base_url="...")
Local inference
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.chats.openrouter import OpenRouterChat
# OpenAI
chat = OpenAIChat(api_key="sk-...")
# Anthropic
chat = AnthropicChat(api_key="sk-ant-...")
# OpenRouter (OpenAI-compatible)
chat = OpenRouterChat(api_key="sk-or-...")
# Ollama (local)
chat = OpenAIChat(base_url="http://localhost:11434/v1", api_key="ollama")LLM Events
All providers emit the same normalized LLMEvent subclasses in a guaranteed order.
| Event | Key Fields | Description |
|---|---|---|
| LLMStartEvent | model, message_count | Call started |
| TokenEvent | content: str | Text fragment token-by-token |
| ReasoningStartEvent | — | Model started extended thinking |
| ReasoningTokenEvent | content: str | Reasoning fragment |
| ReasoningDoneEvent | content: str | Complete reasoning block |
| ToolCallDeltaEvent | index, tool_name, args_chunk | Partial tool args (real-time UI) |
| ToolCallEvent | index, tool_call_id, tool_name, args | Complete tool call ready to execute |
| UsageEvent | model, usage, cost_usd, cost_known | Tokens + cost in USD |
| LLMDoneEvent | content, has_tool_calls | Full assembled text (always last) |
| ErrorEvent | message, code, retryable | Error with retry hint |
LLMStartEvent
model, message_count
Call started
TokenEvent
content: str
Text fragment token-by-token
ToolCallEvent
index, tool_call_id, tool_name, args
Complete tool call ready to execute
UsageEvent
model, usage, cost_usd, cost_known
Tokens + cost in USD
LLMDoneEvent
content, has_tool_calls
Full assembled text (always last)
ErrorEvent
message, code, retryable
Error with retry hint
from phoson_llm.schemas import Message, ModelConfig
chat = OpenAIChat()
messages = [Message(role="user", content="Hello!")]
config = ModelConfig(model="openai/gpt-4o-mini", max_tokens=256)
async for event in chat.stream(messages, config):
match event:
case TokenEvent(content=c):
print(c, end="", flush=True)
case UsageEvent(cost_usd=cost):
print(f"\nCost: ${cost:.6f}")
case LLMDoneEvent():
print("\n[done]")Schemas
All input/output types are plain Python dataclasses in phoson_llm.schemas.
from phoson_llm.schemas import (
Message, # role: "system" | "user" | "assistant", content: str | list[ContentBlock]
ModelConfig, # model, temperature, max_tokens, system, thinking_budget, reasoning_effort
ToolDefinition, # name, description, parameters (JSON Schema)
# Content blocks for multimodal messages:
TextBlock, # text: str
ImageBlock, # source: url | base64 | file://, detail, media_type
AudioBlock, # source, format, duration_ms
VideoBlock, # source, sampling_interval_ms
DocumentBlock, # source (PDF), pages
ToolUseBlock, # tool_call_id, tool_name, args
ToolResultBlock, # tool_call_id, result, error
)
# Example: multimodal message with image
msg = Message(
role="user",
content=[
TextBlock(text="What's in this image?"),
ImageBlock(source="https://example.com/photo.jpg", detail="high"),
]
)ModelConfig
| Field | Type | Default | Description |
|---|---|---|---|
| model | str | — | Model ID, e.g. "openai/gpt-4o" |
| temperature | float | 0.7 | Sampling temperature (0.0–2.0) |
| max_tokens | int | 32 768 | Maximum tokens to generate |
| system | str | None | None | System prompt to prepend |
| thinking_budget | int | None | None | Token budget for Anthropic thinking |
| reasoning_effort | "low" | "medium" | "high" | None | None | OpenAI o1/o3 reasoning effort |
Pricing
Built-in pricing module for USD cost calculation. Supports Anthropic, OpenAI, and Google Gemini. Unknown models return cost_known=False.
from phoson_llm.pricing import calculate_cost
cost_usd, cost_known = calculate_cost(
model="openai/gpt-4o",
input_tokens=1000,
output_tokens=500,
cache_read_tokens=200,
)
# cost_usd ≈ 0.0035, cost_known=True
# Silence warnings for unknown models (e.g. Ollama)
import warnings
from phoson_llm.pricing import UnknownModelWarning
warnings.filterwarnings("ignore", category=UnknownModelWarning)phoson_agent
Stateless-by-run agent orchestration with tool execution, middleware hooks, plugin system, and branchable session storage.
AgentEngine
Main entry point. Single-flight: one concurrent run per instance. Use separate instances for parallel runs.
from phoson_agent import AgentEngine
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.schemas import Message, ModelConfig
engine = AgentEngine(
chat=AnthropicChat(),
tools=[], # list[AgentTool]
middlewares=[], # list[AgentMiddleware]
plugins=[], # list[str | dict | Plugin]
phoson_weight=1.2, # credits = cost_usd * phoson_weight
max_iterations=12, # max ReAct iterations before error
)
# ── Async streaming (recommended) ──────────────────────────────
async for event in engine.stream(messages, config):
match event:
case AgentTokenEvent(content=c): print(c, end="")
case AgentToolStartEvent(tool_name=t): print(f"→ {t}")
case AgentDoneEvent(result=r): print(f"Cost: ${r.total_cost_usd:.6f}")
# ── Async run (collect result) ──────────────────────────────────
result = await engine.run(messages, config)
print(result.final_content)
# ── Sync run (no event loop) ────────────────────────────────────
result = engine.run_sync(messages, config)
# ── Context manager (auto-cleanup plugins) ──────────────────────
with AgentEngine(chat=chat, tools=tools, plugins=["my-plugin"]) as engine:
result = engine.run_sync(messages, config)AgentRunResult
| Field | Type | Description |
|---|---|---|
| final_content | str | The agent's final text response |
| history | list[Message] | Full conversation history after run |
| input_messages | list[Message] | Original input messages snapshot |
| steps | list[RunStep] | All LLM and tool steps with timing |
| total_cost_usd | float | Accumulated cost in USD |
| total_credits | float | Accumulated credits (cost × phoson_weight) |
@tool decorator
Transforms any Python function into an AgentTool with auto-generated JSON Schema from type hints. Supports sync and async functions.
from typing import Annotated, Literal
from phoson_agent import tool
# Basic tool — schema inferred from type hints
@tool
def search_web(query: str, max_results: int = 5) -> list[dict]:
"""Search the web and return results."""
return [{"url": "...", "snippet": "..."}]
# Annotated descriptions per parameter
@tool
def calculate(
expression: Annotated[str, "A mathematical expression to evaluate"],
precision: Annotated[int, "Decimal places in result"] = 2,
) -> str:
"""Evaluate a mathematical expression safely."""
return str(round(eval(expression), precision))
# Literal enum constraint
@tool
def set_language(lang: Literal["en", "es", "fr", "de"]) -> str:
"""Set the response language."""
return f"Language set to {lang}"
# Async tool
@tool
async def fetch_url(url: str) -> str:
"""Fetch the content of a URL."""
import httpx
async with httpx.AsyncClient() as client:
r = await client.get(url)
return r.text
# Context injection — inject values from AgentContext
@tool(inject=["user_id"])
def get_user_profile(*, user_id: str) -> dict:
"""Get the profile of the current user."""
return {"id": user_id, "name": "Alice"}Agent Events
Events yielded by engine.stream(). All have a timestamp field.
| Event | Key Fields |
|---|---|
| AgentStartEvent | model, message_count, max_iterations |
| AgentTokenEvent | content: str |
| AgentReasoningEvent | content: str |
| AgentToolStartEvent | index, tool_call_id, tool_name, args, label |
| AgentToolDoneEvent | index, tool_call_id, tool_name, result, error, duration_ms |
| AgentStepDoneEvent | step: RunStep |
| AgentDoneEvent | result: AgentRunResult |
| AgentErrorEvent | message, code, retryable |
| AgentSubagentResult | index, task, result, cost_usd, credits, duration_ms |
Middleware
Extend the agent lifecycle with hooks. Override only the methods you need.
from phoson_agent.middleware import AgentMiddleware
from phoson_llm.schemas import Message, ModelConfig, ToolCallEvent
class LoggingMiddleware(AgentMiddleware):
"""Logs every LLM call and tool execution."""
async def on_before_llm(self, messages: list[Message], config: ModelConfig) -> list[Message]:
print(f"[LLM] → {config.model} ({len(messages)} messages)")
return messages # must return messages (can modify them)
async def on_before_tool(self, call: ToolCallEvent) -> ToolCallEvent | None:
print(f"[Tool] → {call.tool_name}({call.args})")
return call # return None to cancel the tool call
async def on_after_tool(self, call: ToolCallEvent, result: str, error: bool) -> str:
print(f"[Tool] ← {call.tool_name}: {'ERROR' if error else 'OK'}")
return result # can transform the result
engine = AgentEngine(chat=chat, tools=tools, middlewares=[LoggingMiddleware()])Built-in: RetryMiddleware
from phoson_agent.middleware import RetryMiddleware
retry = RetryMiddleware(
max_retries=2,
base_delay_seconds=0.5,
backoff_multiplier=2.0,
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[retry])Built-in: SummarizationMiddleware
Automatically compacts conversations that exceed a configurable percentage of the model's context window using a summary generated by the LLM itself.
from phoson_agent.plugins.summarizer import SummarizationMiddleware
summarizer = SummarizationMiddleware(
threshold=0.80, # compact when > 80% of context window used
min_keep_messages=4, # always preserve last N messages
provider="openrouter",
model="anthropic/claude-sonnet-4-6",
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[summarizer])Plugins
Plugins bundle tools and middlewares together. Load them by package name, path, or instance. Plugins are initialized at engine construction time.
from phoson_agent.plugin import Plugin
from phoson_agent.middleware import AgentMiddleware
from phoson_agent.models import AgentTool
class MyPlugin(Plugin):
@property
def name(self) -> str:
return "my-plugin"
def get_tools(self) -> list[AgentTool]:
return [my_tool]
def get_middlewares(self) -> list[AgentMiddleware]:
return [LoggingMiddleware()]
def initialize(self) -> None:
print("Plugin initialized")
def cleanup(self) -> None:
print("Plugin cleaned up")
# Load by instance, package string, or dict with config
engine = AgentEngine(
chat=chat,
plugins=[
MyPlugin(), # instance
"phoson_plugin_mcp", # package string
{"name": "my-plugin", "config": {"key": "v"}}, # dict with config
],
)Sessions
Unlike linear chat histories, Phoson uses a ConversationTree — a branchable DAG where you can explore different conversation paths and return to previous branches.
from phoson_agent.sessions import ConversationTree, JsonlStorage
from phoson_llm.schemas import Message
# Create a new session tree
tree = ConversationTree.new(session_id="my-session")
# Append messages
node_a = tree.append(parent_id=None, message=Message(role="user", content="Hello"))
node_b = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hi!"))
# Branch from node_a to explore a different path
node_c = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hey there!"))
# Get the full message path to any node
path = tree.get_path(node_b.id) # [user: "Hello", assistant: "Hi!"]
# Label a node
tree.label(node_b.id, "main branch")
# Persist with JSONL storage
storage = JsonlStorage(base_dir="~/.phoson/sessions")
await storage.save(tree)
loaded = await storage.load("my-session")
# List all sessions
sessions = await storage.list_sessions() # list[SessionMeta]
# Delete a session
await storage.delete("my-session")phoson_cli
Interactive REPL for live agent sessions with real-time streaming, session branching, model switching, and multimodal file attachments.
# Start the REPL
uv run phoson-cli
# Run the setup wizard
uv run phoson-cli --setupREPL Commands
| Command | Description |
|---|---|
| /new | Start a new session |
| /model <name> | Switch model (e.g. /model claude-sonnet-4-6) |
| /tree | Visualize the current conversation tree |
| /sessions | List and load saved sessions |
| /branch | Branch from the current node |
| /label <text> | Label the current conversation node |
| /clear | Clear the screen |
| /exit, /quit | Exit the REPL |
| /help | Show all available commands |
/newStart a new session/model <name>Switch model/treeVisualize conversation tree/sessionsList saved sessions/branchBranch from current node/label <text>Label current nodeMCP Plugin
phoson_plugin_mcp integrates Model Context Protocol servers with the agent engine. Supports STDIO, SSE, and HTTP transports.
Install MCP dependency
uv add mcpConfiguration (phoson-mcp.json)
{
"mcpServers": {
"filesystem": {
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
},
"my-sse-server": {
"transport": "sse",
"url": "http://localhost:3000/sse",
"headers": { "Authorization": "Bearer token" }
},
"my-http-server": {
"transport": "http",
"url": "http://localhost:3000/mcp"
}
}
}Usage
from phoson_agent import AgentEngine
from phoson_plugin_mcp import MCPPlugin
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig
# Load MCP plugin — tools are auto-discovered from phoson-mcp.json
mcp_plugin = MCPPlugin()
engine = AgentEngine(
chat=OpenAIChat(),
plugins=[mcp_plugin],
)
result = engine.run_sync(
messages=[Message(role="user", content="List files in /tmp")],
config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)
print(result.final_content)Environment Variables
Set provider API keys in your environment or a .env file.
# Required for each provider you use:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...
# Optional: Ollama runs locally, no key needed
# OLLAMA_BASE_URL=http://localhost:11434/v1.github/workflows/security.yml workflow runs pip-audit and secret scanning on every PR.CI & Security
Two GitHub Actions workflows keep the codebase healthy on every PR and push to main.
ci.yml
- ruff format check
- ruff lint
- python -m compileall
- pytest (unit + integration)
security.yml
- pip-audit (dependency audit)
- Secret scanning (gitleaks)
- Runs on PRs, main pushes
- Weekly scheduled scan
Commit message format
Conventional Commits enforced via a commit-msg pre-commit hook.
feat: add streaming chat abstraction
fix: handle unknown model pricing fallback
docs: update README with session examples
refactor: extract ToolRunner from AgentEngine
test: add edge cases for context window plugin
chore: update pre-commit hook versions📦 Core dependencies
anthropic ≥0.97openai ≥2.32httpx ≥0.28prompt-toolkit ≥3.0rich ≥15.0tiktoken ≥0.9