phoson-engine-minimal

v0.1.0Open Source · MIT

Minimal Python runtime for the Phoson autonomous-agent platform. Framework-free ReAct loop built directly on provider SDKs — no LangChain, no LangGraph. Full control over streaming, tool execution, cost tracking, and session branching.

$git clone https://github.com/phoson-lat/phoson-engine-minimal

Python 3.12+Framework-freeMulti-providerTyped eventsBranchable sessionsCost tracking

Why Phoson Engine?

Traditional Frameworks	Phoson Engine
Heavy dependencies	Zero external agent frameworks
Linear conversations	Branchable conversation trees
Black-box streaming	Full typed event visibility
Fixed ReAct patterns	Custom ReAct loop
Enterprise pricing	MIT licensed & open source

Installation

Clone & uv sync

Quick Start

Agent running in 10 lines

phoson_llm

LLM adapters & typed events

phoson_agent

ReAct loop, tools & middleware

Sessions

Branchable conversation trees

MCP Plugin

Model Context Protocol integration

Installation

Clone the repository and install dependencies with uv.

terminal

git clone https://github.com/phoson-lat/phoson-engine-minimal.git
cd phoson-engine-minimal

# Install all dependencies (including dev)
uv sync --dev --locked

# Install git hooks (optional but recommended)
uv run pre-commit install --install-hooks
uv run pre-commit install --hook-type commit-msg
uv run pre-commit install --hook-type pre-push

Run checks locally

bash

uv run ruff format --check .
uv run ruff check .
uv run python -m compileall phoson_llm phoson_agent phoson_cli
uv run pytest -q

Quick Start

Run a minimal agent with a tool in under 20 lines.

main.py

from phoson_agent import AgentEngine, tool
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig

# 1. Define a tool with the @tool decorator
@tool
def get_weather(city: str, country: str = "MX") -> dict:
    """Returns current weather for a given city."""
    return {"city": city, "condition": "sunny", "temperature_c": 27}

# 2. Create the engine
engine = AgentEngine(
    chat=OpenAIChat(),          # reads OPENAI_API_KEY from env
    tools=[get_weather],
    phoson_weight=1.2,          # credits multiplier
)

# 3. Run synchronously
result = engine.run_sync(
    messages=[Message(role="user", content="What's the weather in Querétaro?")],
    config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)

print(result.final_content)
print(f"Cost: ${result.total_cost_usd:.6f} | Credits: {result.total_credits:.4f}")

Interactive CLI

Launch the interactive REPL for live agent sessions with streaming and session branching.

bash

# Start the interactive REPL
uv run phoson-cli

# Run the setup wizard (configure provider & model)
uv run phoson-cli --setup

Architecture

The engine is composed of three independent Python packages that can be used separately or together.

phoson_llm

LLM normalization layer. Provider adapters + typed event stream + pricing.

phoson_agent

ReAct agent loop, tool execution, middleware hooks, plugin system, session trees.

phoson_cli

Interactive REPL with streaming, branching, model switching, and session picker.

Runtime loop

The ReAct loop iterates up to max_iterations times. Each iteration calls the LLM, executes any tool calls, and feeds results back until the model produces a final answer.

loop diagram

Client → AgentEngine.run(messages, config)
  └─ AgentLoop.run_iteration()
       ├─ LLM.stream()  →  TokenEvent / ToolCallEvent / UsageEvent / LLMDoneEvent
       │
       ├─ [if ToolCallEvent]
       │    └─ ToolRunner.execute(args)  →  result / error
       │         └─ history += ToolResultBlock
       │         └─ continue loop
       │
       └─ [if LLMDoneEvent, no tools]
            └─ AgentDoneEvent  →  AgentRunResult

phoson_llm

LLM normalization layer that wraps provider SDKs and returns a single typed event stream regardless of provider. All adapters extend BaseLLMChat.

Providers

Provider	Class	Notes
Anthropic	AnthropicChat	Native thinking, tool use, prompt caching
OpenAI	OpenAIChat	Tool use, reasoning_effort for o1/o3
OpenRouter	OpenAIChat(base_url=...)	300+ models via OpenAI-compatible API
Ollama	OpenAIChat(base_url="http://localhost:11434/v1")	Local inference, api_key="ollama"

Anthropic

AnthropicChat

Native thinking, tool use, prompt caching

OpenAI

OpenAIChat

Tool use, reasoning_effort for o1/o3

OpenRouter

OpenAIChat(base_url=...)

300+ models via OpenAI-compatible API

Ollama

OpenAIChat(base_url="...")

Local inference

providers.py

from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.chats.openrouter import OpenRouterChat

# OpenAI
chat = OpenAIChat(api_key="sk-...")

# Anthropic
chat = AnthropicChat(api_key="sk-ant-...")

# OpenRouter (OpenAI-compatible)
chat = OpenRouterChat(api_key="sk-or-...")

# Ollama (local)
chat = OpenAIChat(base_url="http://localhost:11434/v1", api_key="ollama")

LLM Events

All providers emit the same normalized LLMEvent subclasses in a guaranteed order.

Event	Key Fields	Description
LLMStartEvent	model, message_count	Call started
TokenEvent	content: str	Text fragment token-by-token
ReasoningStartEvent	—	Model started extended thinking
ReasoningTokenEvent	content: str	Reasoning fragment
ReasoningDoneEvent	content: str	Complete reasoning block
ToolCallDeltaEvent	index, tool_name, args_chunk	Partial tool args (real-time UI)
ToolCallEvent	index, tool_call_id, tool_name, args	Complete tool call ready to execute
UsageEvent	model, usage, cost_usd, cost_known	Tokens + cost in USD
LLMDoneEvent	content, has_tool_calls	Full assembled text (always last)
ErrorEvent	message, code, retryable	Error with retry hint

LLMStartEvent

model, message_count

Call started

TokenEvent

content: str

Text fragment token-by-token

ToolCallEvent

index, tool_call_id, tool_name, args

Complete tool call ready to execute

UsageEvent

model, usage, cost_usd, cost_known

Tokens + cost in USD

LLMDoneEvent

content, has_tool_calls

Full assembled text (always last)

ErrorEvent

message, code, retryable

Error with retry hint

stream_events.py

from phoson_llm.schemas import Message, ModelConfig

chat = OpenAIChat()
messages = [Message(role="user", content="Hello!")]
config = ModelConfig(model="openai/gpt-4o-mini", max_tokens=256)

async for event in chat.stream(messages, config):
    match event:
        case TokenEvent(content=c):
            print(c, end="", flush=True)
        case UsageEvent(cost_usd=cost):
            print(f"\nCost: ${cost:.6f}")
        case LLMDoneEvent():
            print("\n[done]")

Schemas

All input/output types are plain Python dataclasses in phoson_llm.schemas.

schemas.py

from phoson_llm.schemas import (
    Message,          # role: "system" | "user" | "assistant", content: str | list[ContentBlock]
    ModelConfig,      # model, temperature, max_tokens, system, thinking_budget, reasoning_effort
    ToolDefinition,   # name, description, parameters (JSON Schema)
    # Content blocks for multimodal messages:
    TextBlock,        # text: str
    ImageBlock,       # source: url | base64 | file://, detail, media_type
    AudioBlock,       # source, format, duration_ms
    VideoBlock,       # source, sampling_interval_ms
    DocumentBlock,    # source (PDF), pages
    ToolUseBlock,     # tool_call_id, tool_name, args
    ToolResultBlock,  # tool_call_id, result, error
)

# Example: multimodal message with image
msg = Message(
    role="user",
    content=[
        TextBlock(text="What's in this image?"),
        ImageBlock(source="https://example.com/photo.jpg", detail="high"),
    ]
)

ModelConfig

Field	Type	Default	Description
model	str	—	Model ID, e.g. "openai/gpt-4o"
temperature	float	0.7	Sampling temperature (0.0–2.0)
max_tokens	int	32 768	Maximum tokens to generate
system	str \| None	None	System prompt to prepend
thinking_budget	int \| None	None	Token budget for Anthropic thinking
reasoning_effort	"low" \| "medium" \| "high" \| None	None	OpenAI o1/o3 reasoning effort

Pricing

Built-in pricing module for USD cost calculation. Supports Anthropic, OpenAI, and Google Gemini. Unknown models return cost_known=False.

pricing.py

from phoson_llm.pricing import calculate_cost

cost_usd, cost_known = calculate_cost(
    model="openai/gpt-4o",
    input_tokens=1000,
    output_tokens=500,
    cache_read_tokens=200,
)
# cost_usd ≈ 0.0035, cost_known=True

# Silence warnings for unknown models (e.g. Ollama)
import warnings
from phoson_llm.pricing import UnknownModelWarning
warnings.filterwarnings("ignore", category=UnknownModelWarning)

phoson_agent

Stateless-by-run agent orchestration with tool execution, middleware hooks, plugin system, and branchable session storage.

AgentEngine

Main entry point. Single-flight: one concurrent run per instance. Use separate instances for parallel runs.

engine.py

from phoson_agent import AgentEngine
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.schemas import Message, ModelConfig

engine = AgentEngine(
    chat=AnthropicChat(),
    tools=[],                  # list[AgentTool]
    middlewares=[],            # list[AgentMiddleware]
    plugins=[],                # list[str | dict | Plugin]
    phoson_weight=1.2,         # credits = cost_usd * phoson_weight
    max_iterations=12,         # max ReAct iterations before error
)

# ── Async streaming (recommended) ──────────────────────────────
async for event in engine.stream(messages, config):
    match event:
        case AgentTokenEvent(content=c): print(c, end="")
        case AgentToolStartEvent(tool_name=t): print(f"→ {t}")
        case AgentDoneEvent(result=r): print(f"Cost: ${r.total_cost_usd:.6f}")

# ── Async run (collect result) ──────────────────────────────────
result = await engine.run(messages, config)
print(result.final_content)

# ── Sync run (no event loop) ────────────────────────────────────
result = engine.run_sync(messages, config)

# ── Context manager (auto-cleanup plugins) ──────────────────────
with AgentEngine(chat=chat, tools=tools, plugins=["my-plugin"]) as engine:
    result = engine.run_sync(messages, config)

AgentRunResult

Field	Type	Description
final_content	str	The agent's final text response
history	list[Message]	Full conversation history after run
input_messages	list[Message]	Original input messages snapshot
steps	list[RunStep]	All LLM and tool steps with timing
total_cost_usd	float	Accumulated cost in USD
total_credits	float	Accumulated credits (cost × phoson_weight)

`@tool` decorator

Transforms any Python function into an AgentTool with auto-generated JSON Schema from type hints. Supports sync and async functions.

tools.py

from typing import Annotated, Literal
from phoson_agent import tool

# Basic tool — schema inferred from type hints
@tool
def search_web(query: str, max_results: int = 5) -> list[dict]:
    """Search the web and return results."""
    return [{"url": "...", "snippet": "..."}]

# Annotated descriptions per parameter
@tool
def calculate(
    expression: Annotated[str, "A mathematical expression to evaluate"],
    precision: Annotated[int, "Decimal places in result"] = 2,
) -> str:
    """Evaluate a mathematical expression safely."""
    return str(round(eval(expression), precision))

# Literal enum constraint
@tool
def set_language(lang: Literal["en", "es", "fr", "de"]) -> str:
    """Set the response language."""
    return f"Language set to {lang}"

# Async tool
@tool
async def fetch_url(url: str) -> str:
    """Fetch the content of a URL."""
    import httpx
    async with httpx.AsyncClient() as client:
        r = await client.get(url)
        return r.text

# Context injection — inject values from AgentContext
@tool(inject=["user_id"])
def get_user_profile(*, user_id: str) -> dict:
    """Get the profile of the current user."""
    return {"id": user_id, "name": "Alice"}

Agent Events

Events yielded by engine.stream(). All have a timestamp field.

Event	Key Fields
AgentStartEvent	model, message_count, max_iterations
AgentTokenEvent	content: str
AgentReasoningEvent	content: str
AgentToolStartEvent	index, tool_call_id, tool_name, args, label
AgentToolDoneEvent	index, tool_call_id, tool_name, result, error, duration_ms
AgentStepDoneEvent	step: RunStep
AgentDoneEvent	result: AgentRunResult
AgentErrorEvent	message, code, retryable
AgentSubagentResult	index, task, result, cost_usd, credits, duration_ms

Middleware

Extend the agent lifecycle with hooks. Override only the methods you need.

middleware.py

from phoson_agent.middleware import AgentMiddleware
from phoson_llm.schemas import Message, ModelConfig, ToolCallEvent

class LoggingMiddleware(AgentMiddleware):
    """Logs every LLM call and tool execution."""

    async def on_before_llm(self, messages: list[Message], config: ModelConfig) -> list[Message]:
        print(f"[LLM] → {config.model} ({len(messages)} messages)")
        return messages  # must return messages (can modify them)

    async def on_before_tool(self, call: ToolCallEvent) -> ToolCallEvent | None:
        print(f"[Tool] → {call.tool_name}({call.args})")
        return call  # return None to cancel the tool call

    async def on_after_tool(self, call: ToolCallEvent, result: str, error: bool) -> str:
        print(f"[Tool] ← {call.tool_name}: {'ERROR' if error else 'OK'}")
        return result  # can transform the result

engine = AgentEngine(chat=chat, tools=tools, middlewares=[LoggingMiddleware()])

Built-in: RetryMiddleware

python

from phoson_agent.middleware import RetryMiddleware

retry = RetryMiddleware(
    max_retries=2,
    base_delay_seconds=0.5,
    backoff_multiplier=2.0,
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[retry])

Built-in: SummarizationMiddleware

Automatically compacts conversations that exceed a configurable percentage of the model's context window using a summary generated by the LLM itself.

python

from phoson_agent.plugins.summarizer import SummarizationMiddleware

summarizer = SummarizationMiddleware(
    threshold=0.80,            # compact when > 80% of context window used
    min_keep_messages=4,       # always preserve last N messages
    provider="openrouter",
    model="anthropic/claude-sonnet-4-6",
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[summarizer])

Plugins

Plugins bundle tools and middlewares together. Load them by package name, path, or instance. Plugins are initialized at engine construction time.

plugins.py

from phoson_agent.plugin import Plugin
from phoson_agent.middleware import AgentMiddleware
from phoson_agent.models import AgentTool

class MyPlugin(Plugin):
    @property
    def name(self) -> str:
        return "my-plugin"

    def get_tools(self) -> list[AgentTool]:
        return [my_tool]

    def get_middlewares(self) -> list[AgentMiddleware]:
        return [LoggingMiddleware()]

    def initialize(self) -> None:
        print("Plugin initialized")

    def cleanup(self) -> None:
        print("Plugin cleaned up")

# Load by instance, package string, or dict with config
engine = AgentEngine(
    chat=chat,
    plugins=[
        MyPlugin(),                                    # instance
        "phoson_plugin_mcp",                           # package string
        {"name": "my-plugin", "config": {"key": "v"}}, # dict with config
    ],
)

Sessions

Unlike linear chat histories, Phoson uses a ConversationTree — a branchable DAG where you can explore different conversation paths and return to previous branches.

sessions.py

from phoson_agent.sessions import ConversationTree, JsonlStorage
from phoson_llm.schemas import Message

# Create a new session tree
tree = ConversationTree.new(session_id="my-session")

# Append messages
node_a = tree.append(parent_id=None, message=Message(role="user", content="Hello"))
node_b = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hi!"))

# Branch from node_a to explore a different path
node_c = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hey there!"))

# Get the full message path to any node
path = tree.get_path(node_b.id)  # [user: "Hello", assistant: "Hi!"]

# Label a node
tree.label(node_b.id, "main branch")

# Persist with JSONL storage
storage = JsonlStorage(base_dir="~/.phoson/sessions")
await storage.save(tree)
loaded = await storage.load("my-session")

# List all sessions
sessions = await storage.list_sessions()  # list[SessionMeta]

# Delete a session
await storage.delete("my-session")

phoson_cli

Interactive REPL for live agent sessions with real-time streaming, session branching, model switching, and multimodal file attachments.

bash

# Start the REPL
uv run phoson-cli

# Run the setup wizard
uv run phoson-cli --setup

REPL Commands

Command	Description
/new	Start a new session
/model <name>	Switch model (e.g. /model claude-sonnet-4-6)
/tree	Visualize the current conversation tree
/sessions	List and load saved sessions
/branch	Branch from the current node
/label <text>	Label the current conversation node
/clear	Clear the screen
/exit, /quit	Exit the REPL
/help	Show all available commands

/newStart a new session

/model <name>Switch model

/treeVisualize conversation tree

/sessionsList saved sessions

/branchBranch from current node

/label <text>Label current node

MCP Plugin

phoson_plugin_mcp integrates Model Context Protocol servers with the agent engine. Supports STDIO, SSE, and HTTP transports.

Install MCP dependency

bash

uv add mcp

Configuration (phoson-mcp.json)

phoson-mcp.json

{
  "mcpServers": {
    "filesystem": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    },
    "my-sse-server": {
      "transport": "sse",
      "url": "http://localhost:3000/sse",
      "headers": { "Authorization": "Bearer token" }
    },
    "my-http-server": {
      "transport": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

Usage

mcp_usage.py

from phoson_agent import AgentEngine
from phoson_plugin_mcp import MCPPlugin
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig

# Load MCP plugin — tools are auto-discovered from phoson-mcp.json
mcp_plugin = MCPPlugin()

engine = AgentEngine(
    chat=OpenAIChat(),
    plugins=[mcp_plugin],
)

result = engine.run_sync(
    messages=[Message(role="user", content="List files in /tmp")],
    config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)
print(result.final_content)

Environment Variables

Set provider API keys in your environment or a .env file.

.env

# Required for each provider you use:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...

# Optional: Ollama runs locally, no key needed
# OLLAMA_BASE_URL=http://localhost:11434/v1

Security: Never commit API keys to source control. Use environment variables or a secrets manager. The .github/workflows/security.yml workflow runs pip-audit and secret scanning on every PR.

CI & Security

Two GitHub Actions workflows keep the codebase healthy on every PR and push to main.

ci.yml

ruff format check
ruff lint
python -m compileall
pytest (unit + integration)

security.yml

pip-audit (dependency audit)
Secret scanning (gitleaks)
Runs on PRs, main pushes
Weekly scheduled scan

Commit message format

Conventional Commits enforced via a commit-msg pre-commit hook.

bash

feat: add streaming chat abstraction
fix: handle unknown model pricing fallback
docs: update README with session examples
refactor: extract ToolRunner from AgentEngine
test: add edge cases for context window plugin
chore: update pre-commit hook versions

📦 Core dependencies

anthropic ≥0.97openai ≥2.32httpx ≥0.28prompt-toolkit ≥3.0rich ≥15.0tiktoken ≥0.9