phoson-engine-minimal

v0.1.0Open Source · MIT

Minimal Python runtime for the Phoson autonomous-agent platform. Framework-free ReAct loop built directly on provider SDKs — no LangChain, no LangGraph. Full control over streaming, tool execution, cost tracking, and session branching.

$git clone https://github.com/phoson-lat/phoson-engine-minimal
Python 3.12+Framework-freeMulti-providerTyped eventsBranchable sessionsCost tracking

Why Phoson Engine?

Traditional FrameworksPhoson Engine
Heavy dependenciesZero external agent frameworks
Linear conversationsBranchable conversation trees
Black-box streamingFull typed event visibility
Fixed ReAct patternsCustom ReAct loop
Enterprise pricingMIT licensed & open source

Installation

Clone the repository and install dependencies with uv.

terminal
git clone https://github.com/phoson-lat/phoson-engine-minimal.git
cd phoson-engine-minimal

# Install all dependencies (including dev)
uv sync --dev --locked

# Install git hooks (optional but recommended)
uv run pre-commit install --install-hooks
uv run pre-commit install --hook-type commit-msg
uv run pre-commit install --hook-type pre-push

Run checks locally

bash
uv run ruff format --check .
uv run ruff check .
uv run python -m compileall phoson_llm phoson_agent phoson_cli
uv run pytest -q

Quick Start

Run a minimal agent with a tool in under 20 lines.

main.py
from phoson_agent import AgentEngine, tool
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig

# 1. Define a tool with the @tool decorator
@tool
def get_weather(city: str, country: str = "MX") -> dict:
    """Returns current weather for a given city."""
    return {"city": city, "condition": "sunny", "temperature_c": 27}

# 2. Create the engine
engine = AgentEngine(
    chat=OpenAIChat(),          # reads OPENAI_API_KEY from env
    tools=[get_weather],
    phoson_weight=1.2,          # credits multiplier
)

# 3. Run synchronously
result = engine.run_sync(
    messages=[Message(role="user", content="What's the weather in Querétaro?")],
    config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)

print(result.final_content)
print(f"Cost: ${result.total_cost_usd:.6f} | Credits: {result.total_credits:.4f}")

Interactive CLI

Launch the interactive REPL for live agent sessions with streaming and session branching.

bash
# Start the interactive REPL
uv run phoson-cli

# Run the setup wizard (configure provider & model)
uv run phoson-cli --setup

Architecture

The engine is composed of three independent Python packages that can be used separately or together.

phoson_llm

LLM normalization layer. Provider adapters + typed event stream + pricing.

phoson_agent

ReAct agent loop, tool execution, middleware hooks, plugin system, session trees.

phoson_cli

Interactive REPL with streaming, branching, model switching, and session picker.

Runtime loop

The ReAct loop iterates up to max_iterations times. Each iteration calls the LLM, executes any tool calls, and feeds results back until the model produces a final answer.

loop diagram
Client AgentEngine.run(messages, config)
  └─ AgentLoop.run_iteration()
       ├─ LLM.stream()    TokenEvent / ToolCallEvent / UsageEvent / LLMDoneEvent

       ├─ [if ToolCallEvent]
    └─ ToolRunner.execute(args)    result / error
         └─ history += ToolResultBlock
         └─ continue loop

       └─ [if LLMDoneEvent, no tools]
            └─ AgentDoneEvent  AgentRunResult

phoson_llm

LLM normalization layer that wraps provider SDKs and returns a single typed event stream regardless of provider. All adapters extend BaseLLMChat.

Providers

Anthropic

AnthropicChat

Native thinking, tool use, prompt caching

OpenAI

OpenAIChat

Tool use, reasoning_effort for o1/o3

OpenRouter

OpenAIChat(base_url=...)

300+ models via OpenAI-compatible API

Ollama

OpenAIChat(base_url="...")

Local inference

providers.py
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.chats.openrouter import OpenRouterChat

# OpenAI
chat = OpenAIChat(api_key="sk-...")

# Anthropic
chat = AnthropicChat(api_key="sk-ant-...")

# OpenRouter (OpenAI-compatible)
chat = OpenRouterChat(api_key="sk-or-...")

# Ollama (local)
chat = OpenAIChat(base_url="http://localhost:11434/v1", api_key="ollama")

LLM Events

All providers emit the same normalized LLMEvent subclasses in a guaranteed order.

LLMStartEvent

model, message_count

Call started

TokenEvent

content: str

Text fragment token-by-token

ToolCallEvent

index, tool_call_id, tool_name, args

Complete tool call ready to execute

UsageEvent

model, usage, cost_usd, cost_known

Tokens + cost in USD

LLMDoneEvent

content, has_tool_calls

Full assembled text (always last)

ErrorEvent

message, code, retryable

Error with retry hint

stream_events.py
from phoson_llm.schemas import Message, ModelConfig

chat = OpenAIChat()
messages = [Message(role="user", content="Hello!")]
config = ModelConfig(model="openai/gpt-4o-mini", max_tokens=256)

async for event in chat.stream(messages, config):
    match event:
        case TokenEvent(content=c):
            print(c, end="", flush=True)
        case UsageEvent(cost_usd=cost):
            print(f"\nCost: ${cost:.6f}")
        case LLMDoneEvent():
            print("\n[done]")

Schemas

All input/output types are plain Python dataclasses in phoson_llm.schemas.

schemas.py
from phoson_llm.schemas import (
    Message,          # role: "system" | "user" | "assistant", content: str | list[ContentBlock]
    ModelConfig,      # model, temperature, max_tokens, system, thinking_budget, reasoning_effort
    ToolDefinition,   # name, description, parameters (JSON Schema)
    # Content blocks for multimodal messages:
    TextBlock,        # text: str
    ImageBlock,       # source: url | base64 | file://, detail, media_type
    AudioBlock,       # source, format, duration_ms
    VideoBlock,       # source, sampling_interval_ms
    DocumentBlock,    # source (PDF), pages
    ToolUseBlock,     # tool_call_id, tool_name, args
    ToolResultBlock,  # tool_call_id, result, error
)

# Example: multimodal message with image
msg = Message(
    role="user",
    content=[
        TextBlock(text="What's in this image?"),
        ImageBlock(source="https://example.com/photo.jpg", detail="high"),
    ]
)

ModelConfig

Pricing

Built-in pricing module for USD cost calculation. Supports Anthropic, OpenAI, and Google Gemini. Unknown models return cost_known=False.

pricing.py
from phoson_llm.pricing import calculate_cost

cost_usd, cost_known = calculate_cost(
    model="openai/gpt-4o",
    input_tokens=1000,
    output_tokens=500,
    cache_read_tokens=200,
)
# cost_usd ≈ 0.0035, cost_known=True

# Silence warnings for unknown models (e.g. Ollama)
import warnings
from phoson_llm.pricing import UnknownModelWarning
warnings.filterwarnings("ignore", category=UnknownModelWarning)

phoson_agent

Stateless-by-run agent orchestration with tool execution, middleware hooks, plugin system, and branchable session storage.

AgentEngine

Main entry point. Single-flight: one concurrent run per instance. Use separate instances for parallel runs.

engine.py
from phoson_agent import AgentEngine
from phoson_llm.chats.anthropic import AnthropicChat
from phoson_llm.schemas import Message, ModelConfig

engine = AgentEngine(
    chat=AnthropicChat(),
    tools=[],                  # list[AgentTool]
    middlewares=[],            # list[AgentMiddleware]
    plugins=[],                # list[str | dict | Plugin]
    phoson_weight=1.2,         # credits = cost_usd * phoson_weight
    max_iterations=12,         # max ReAct iterations before error
)

# ── Async streaming (recommended) ──────────────────────────────
async for event in engine.stream(messages, config):
    match event:
        case AgentTokenEvent(content=c): print(c, end="")
        case AgentToolStartEvent(tool_name=t): print(f"→ {t}")
        case AgentDoneEvent(result=r): print(f"Cost: ${r.total_cost_usd:.6f}")

# ── Async run (collect result) ──────────────────────────────────
result = await engine.run(messages, config)
print(result.final_content)

# ── Sync run (no event loop) ────────────────────────────────────
result = engine.run_sync(messages, config)

# ── Context manager (auto-cleanup plugins) ──────────────────────
with AgentEngine(chat=chat, tools=tools, plugins=["my-plugin"]) as engine:
    result = engine.run_sync(messages, config)

AgentRunResult

@tool decorator

Transforms any Python function into an AgentTool with auto-generated JSON Schema from type hints. Supports sync and async functions.

tools.py
from typing import Annotated, Literal
from phoson_agent import tool

# Basic tool — schema inferred from type hints
@tool
def search_web(query: str, max_results: int = 5) -> list[dict]:
    """Search the web and return results."""
    return [{"url": "...", "snippet": "..."}]

# Annotated descriptions per parameter
@tool
def calculate(
    expression: Annotated[str, "A mathematical expression to evaluate"],
    precision: Annotated[int, "Decimal places in result"] = 2,
) -> str:
    """Evaluate a mathematical expression safely."""
    return str(round(eval(expression), precision))

# Literal enum constraint
@tool
def set_language(lang: Literal["en", "es", "fr", "de"]) -> str:
    """Set the response language."""
    return f"Language set to {lang}"

# Async tool
@tool
async def fetch_url(url: str) -> str:
    """Fetch the content of a URL."""
    import httpx
    async with httpx.AsyncClient() as client:
        r = await client.get(url)
        return r.text

# Context injection — inject values from AgentContext
@tool(inject=["user_id"])
def get_user_profile(*, user_id: str) -> dict:
    """Get the profile of the current user."""
    return {"id": user_id, "name": "Alice"}

Agent Events

Events yielded by engine.stream(). All have a timestamp field.

Middleware

Extend the agent lifecycle with hooks. Override only the methods you need.

middleware.py
from phoson_agent.middleware import AgentMiddleware
from phoson_llm.schemas import Message, ModelConfig, ToolCallEvent

class LoggingMiddleware(AgentMiddleware):
    """Logs every LLM call and tool execution."""

    async def on_before_llm(self, messages: list[Message], config: ModelConfig) -> list[Message]:
        print(f"[LLM] → {config.model} ({len(messages)} messages)")
        return messages  # must return messages (can modify them)

    async def on_before_tool(self, call: ToolCallEvent) -> ToolCallEvent | None:
        print(f"[Tool] → {call.tool_name}({call.args})")
        return call  # return None to cancel the tool call

    async def on_after_tool(self, call: ToolCallEvent, result: str, error: bool) -> str:
        print(f"[Tool] ← {call.tool_name}: {'ERROR' if error else 'OK'}")
        return result  # can transform the result

engine = AgentEngine(chat=chat, tools=tools, middlewares=[LoggingMiddleware()])

Built-in: RetryMiddleware

python
from phoson_agent.middleware import RetryMiddleware

retry = RetryMiddleware(
    max_retries=2,
    base_delay_seconds=0.5,
    backoff_multiplier=2.0,
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[retry])

Built-in: SummarizationMiddleware

Automatically compacts conversations that exceed a configurable percentage of the model's context window using a summary generated by the LLM itself.

python
from phoson_agent.plugins.summarizer import SummarizationMiddleware

summarizer = SummarizationMiddleware(
    threshold=0.80,            # compact when > 80% of context window used
    min_keep_messages=4,       # always preserve last N messages
    provider="openrouter",
    model="anthropic/claude-sonnet-4-6",
)
engine = AgentEngine(chat=chat, tools=tools, middlewares=[summarizer])

Plugins

Plugins bundle tools and middlewares together. Load them by package name, path, or instance. Plugins are initialized at engine construction time.

plugins.py
from phoson_agent.plugin import Plugin
from phoson_agent.middleware import AgentMiddleware
from phoson_agent.models import AgentTool

class MyPlugin(Plugin):
    @property
    def name(self) -> str:
        return "my-plugin"

    def get_tools(self) -> list[AgentTool]:
        return [my_tool]

    def get_middlewares(self) -> list[AgentMiddleware]:
        return [LoggingMiddleware()]

    def initialize(self) -> None:
        print("Plugin initialized")

    def cleanup(self) -> None:
        print("Plugin cleaned up")

# Load by instance, package string, or dict with config
engine = AgentEngine(
    chat=chat,
    plugins=[
        MyPlugin(),                                    # instance
        "phoson_plugin_mcp",                           # package string
        {"name": "my-plugin", "config": {"key": "v"}}, # dict with config
    ],
)

Sessions

Unlike linear chat histories, Phoson uses a ConversationTree — a branchable DAG where you can explore different conversation paths and return to previous branches.

sessions.py
from phoson_agent.sessions import ConversationTree, JsonlStorage
from phoson_llm.schemas import Message

# Create a new session tree
tree = ConversationTree.new(session_id="my-session")

# Append messages
node_a = tree.append(parent_id=None, message=Message(role="user", content="Hello"))
node_b = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hi!"))

# Branch from node_a to explore a different path
node_c = tree.append(parent_id=node_a.id, message=Message(role="assistant", content="Hey there!"))

# Get the full message path to any node
path = tree.get_path(node_b.id)  # [user: "Hello", assistant: "Hi!"]

# Label a node
tree.label(node_b.id, "main branch")

# Persist with JSONL storage
storage = JsonlStorage(base_dir="~/.phoson/sessions")
await storage.save(tree)
loaded = await storage.load("my-session")

# List all sessions
sessions = await storage.list_sessions()  # list[SessionMeta]

# Delete a session
await storage.delete("my-session")

phoson_cli

Interactive REPL for live agent sessions with real-time streaming, session branching, model switching, and multimodal file attachments.

bash
# Start the REPL
uv run phoson-cli

# Run the setup wizard
uv run phoson-cli --setup

REPL Commands

/newStart a new session
/model <name>Switch model
/treeVisualize conversation tree
/sessionsList saved sessions
/branchBranch from current node
/label <text>Label current node

MCP Plugin

phoson_plugin_mcp integrates Model Context Protocol servers with the agent engine. Supports STDIO, SSE, and HTTP transports.

Install MCP dependency

bash
uv add mcp

Configuration (phoson-mcp.json)

phoson-mcp.json
{
  "mcpServers": {
    "filesystem": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    },
    "my-sse-server": {
      "transport": "sse",
      "url": "http://localhost:3000/sse",
      "headers": { "Authorization": "Bearer token" }
    },
    "my-http-server": {
      "transport": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

Usage

mcp_usage.py
from phoson_agent import AgentEngine
from phoson_plugin_mcp import MCPPlugin
from phoson_llm.chats.openai import OpenAIChat
from phoson_llm.schemas import Message, ModelConfig

# Load MCP plugin — tools are auto-discovered from phoson-mcp.json
mcp_plugin = MCPPlugin()

engine = AgentEngine(
    chat=OpenAIChat(),
    plugins=[mcp_plugin],
)

result = engine.run_sync(
    messages=[Message(role="user", content="List files in /tmp")],
    config=ModelConfig(model="openai/gpt-4o-mini", max_tokens=512),
)
print(result.final_content)

Environment Variables

Set provider API keys in your environment or a .env file.

.env
# Required for each provider you use:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...

# Optional: Ollama runs locally, no key needed
# OLLAMA_BASE_URL=http://localhost:11434/v1
Security: Never commit API keys to source control. Use environment variables or a secrets manager. The .github/workflows/security.yml workflow runs pip-audit and secret scanning on every PR.

CI & Security

Two GitHub Actions workflows keep the codebase healthy on every PR and push to main.

ci.yml

  • ruff format check
  • ruff lint
  • python -m compileall
  • pytest (unit + integration)

security.yml

  • pip-audit (dependency audit)
  • Secret scanning (gitleaks)
  • Runs on PRs, main pushes
  • Weekly scheduled scan

Commit message format

Conventional Commits enforced via a commit-msg pre-commit hook.

bash
feat: add streaming chat abstraction
fix: handle unknown model pricing fallback
docs: update README with session examples
refactor: extract ToolRunner from AgentEngine
test: add edge cases for context window plugin
chore: update pre-commit hook versions

📦 Core dependencies

anthropic ≥0.97openai ≥2.32httpx ≥0.28prompt-toolkit ≥3.0rich ≥15.0tiktoken ≥0.9