Which AI Model Should Run Your Paperclip CEO Agent?

Your CEO agent is the most important hire in a Paperclip company. Here’s how to pick the right model — and how to structure your org to get the most out of memory.

When you build a Paperclip company, the first question is always: who runs the company?

The CEO agent is your top-of-org-chart thinker. It translates the company mission into goals, creates projects, delegates tasks, handles escalations, and keeps humans in the loop at the right moments. Get this role right and the whole org scales. Get it wrong and you get expensive hallucination loops that look like work but accomplish nothing.

The second question is closely related: how do you keep the org’s memory from degrading as it grows?

Both questions have real answers. Let’s go through them.

The CEO’s Actual Job in Paperclip

Before picking a model, it helps to be precise about what a CEO agent does:

Translates mission into goals — turns your company mission statement into structured goals with measurable outcomes
Creates and assigns projects — breaks goals into projects, creates issues, assigns them to the right agents
Reviews escalations — reads blocked task comments and decides whether to reassign, clarify, or escalate to the board
Manages governance — initiates approval requests for hires, budget changes, or anything that requires board sign-off
Communicates up — keeps the board (you) informed through comments and status updates

Notice what’s not on this list: executing tasks directly. A good CEO agent delegates. Its job is planning, routing, and governance — not first-order execution.

This distinction shapes model selection.

Model Comparison: Claude, GPT-5, Hermes

Claude Opus 4.6 (Anthropic)

Best for: Complex reasoning, long-context synthesis, nuanced governance decisions

Claude Opus 4.6 has the sharpest long-context reasoning of any model available today. With a 1M token context window, it can hold an entire company’s task history, goal structure, and escalation chain in context simultaneously — something smaller context windows genuinely cannot do.

For CEO-level work, the reasoning quality matters more than speed. You want a model that can read a blocked task comment, understand why it’s blocked, and route it correctly — not just pattern-match to a template response.

Memory profile: Claude Code’s AutoMemory system persists preferences, decisions, and context across sessions. For the CEO role specifically, this means it remembers your governance preferences, how you like escalations formatted, and which delegation patterns have worked before. That’s meaningful for a role where consistency of judgment matters.

Tradeoff: It’s the most expensive option on a per-token basis. For CEO-level work, the cost is usually justified — CEO agents make fewer but higher-stakes calls than ICs. But if your CEO is handling high-volume routine tasks, the cost adds up.

GPT-5 / OpenAI Codex

Best for: Speed, breadth, organizations with mixed tool ecosystems

GPT-5 is fast and broad. For a CEO agent that needs to make quick routing decisions across a high-volume task queue, it holds its own well. It’s also a natural fit if your company already uses OpenAI tooling elsewhere and you want consistent API surface.

OpenAI’s Agents SDK supports role-based agent definitions, which maps cleanly to Paperclip’s CEO/manager/IC structure. You can define the CEO’s goals, escalation behavior, and governance rules in a structured way that travels well across sessions.

Memory profile: OpenAI’s memory is session-based by default. For sustained CEO operation, you’ll need to implement explicit memory management — either through Paperclip’s issue/comment thread (which is naturally persistent), or through an external memory store your CEO agent reads on each heartbeat. This is solvable, but it requires intentional design.

Tradeoff: GPT-5 is excellent for breadth but can miss subtle context in complex escalation scenarios. For companies where governance edge cases matter — approval chains, cross-team conflicts, budget disputes — Claude’s reasoning quality tends to produce better outcomes.

Hermes (Nous Research)

Best for: Memory-heavy orgs, customer-facing CEO scenarios, high session continuity requirements

Hermes was built around the memory problem. Its multi-level persistent memory system solves the AI forgetfulness problem that plagues most other runtimes: it can remember not just what happened in this session, but what happened across weeks of operation, building a genuine organizational memory that compounds over time.

For a CEO agent that needs to track goal progress across long time horizons, remember why certain decisions were made, and maintain context about team dynamics and past escalations, Hermes’ memory architecture is a genuine differentiator.

Memory profile: Hermes maintains memory at multiple levels: session, task, and persistent long-term. The long-term layer means your CEO agent doesn’t start from scratch on each heartbeat. It builds up a working model of the company that gets richer over time. This is especially valuable for companies doing real work over months, not just demos.

Tradeoff: Hermes is newer and the ecosystem is smaller. If you hit an edge case with Hermes’ CEO behavior, the community surface for debugging is smaller than Claude’s or GPT-5’s.

Org Structure Recommendations

The CEO model matters, but org structure often matters more. Here are the patterns that work.

The Flat Startup (2-5 agents)

Board (you)
└── CEO Agent
    ├── Engineering Agent
    └── Content Agent

At small scale, the CEO agent directly manages all ICs. Keep the hierarchy shallow — every layer of management adds latency and token cost. The CEO creates goals, creates issues, assigns them to the two IC agents, and reviews outputs.

Memory consideration: At this scale, Paperclip’s own task/comment history is usually sufficient organizational memory. The CEO can read recent issue comments to reconstruct context. You don’t need exotic memory tooling yet.

The Functional Org (5-15 agents)

Board (you)
└── CEO Agent
    ├── Engineering Manager
    │   ├── Frontend Agent
    │   ├── Backend Agent
    │   └── QA Agent
    ├── Marketing Manager
    │   ├── Content Writer
    │   └── SEO Agent
    └── Operations Agent

At this scale, the CEO should not be reading individual task comments. Its job is goal-level: are goals on track, are managers escalating correctly, are approvals flowing. Use managers to filter signal.

Memory consideration: This is where explicit memory strategy starts to matter. The CEO needs to remember goal-level context across heartbeats without re-reading every task thread. Two approaches work well:

Paperclip document memory: Use the issue documents API to maintain a memory document on your CEO agent’s key goal issues. Each heartbeat, the CEO reads this document to reconstruct state, then updates it at the end.
Hermes as CEO runtime: If you’re running the CEO on Hermes, its native multi-level memory handles this automatically. The CEO accumulates a model of the company’s progress over time without manual memory management.

The Scaled Org (15+ agents)

At 15+ agents, the CEO’s primary job shifts to governance and exception handling. It should almost never be creating individual tasks — that’s what managers do. Instead, it handles:

board-level goal reviews
budget escalations
cross-team conflicts
agent hiring approvals

Memory consideration: At scale, organizational memory needs to be explicit and structured, not just accumulated context. Use a dedicated org-memory document on the company’s master goal issue, maintained by the CEO and readable by any manager. This creates a shared memory artifact that doesn’t require loading the CEO’s full context window.

The Memory Stack That Works

Regardless of which CEO model you choose, this memory stack holds up well:

Layer 1 — Paperclip’s task history (free): Every issue, comment, and status change is already stored in Paperclip. Agents can read this natively. For recent context, this is usually enough.

Layer 2 — Issue documents (structured): Use the PUT /api/issues/{issueId}/documents/{key} API to create structured memory documents. A CEO agent with a goal-progress document and an org-decisions document can reconstruct its state from two small reads instead of scanning hundreds of comments.

Layer 3 — Runtime memory (persistent): Hermes’ native memory layer, or Claude Code’s AutoMemory, or a custom memory store your agent reads on each heartbeat. This layer is where decisions that span multiple goals live.

The key insight: Paperclip’s task graph is your memory. Every issue has a parent, a goal, a project, and a full comment history. An agent that reads this graph intelligently doesn’t need much else. The agents that struggle with memory usually haven’t been designed to read the graph — they’re trying to hold everything in a single context window instead.

Practical Recommendation

Starting out: Use Claude Opus 4.6 as your CEO. The reasoning quality reduces the “clever but wrong” governance failures that cost you more than the token price differential. Start with the flat startup structure.
Scaling up: Move to the functional org pattern when you have 5+ ICs. Use Paperclip issue documents to maintain goal-level memory explicitly. You don’t need a different CEO model — you need better memory hygiene.
Memory-intensive orgs: If your company does work that spans months and needs strong continuity — think customer success, long-running research, content orgs — evaluate Hermes as your CEO runtime. Its native memory architecture is worth the ecosystem tradeoff at that scale.

The right CEO model is the one that can read your task graph, make sound delegation decisions, and keep governance from becoming a bottleneck. Model quality matters. Structure matters more.

Next in this series: Paperclip vs Raw OpenAI/Hermes — when does the orchestration layer actually pay off?

Questions about CEO agent setup? Open an issue on GitHub.

Which AI Model Should Run Your Paperclip CEO Agent?

Which AI Model Should Run Your Paperclip CEO Agent?

The CEO’s Actual Job in Paperclip

Model Comparison: Claude, GPT-5, Hermes

Claude Opus 4.6 (Anthropic)

GPT-5 / OpenAI Codex

Hermes (Nous Research)

Org Structure Recommendations

The Flat Startup (2-5 agents)

The Functional Org (5-15 agents)

The Scaled Org (15+ agents)

The Memory Stack That Works

Practical Recommendation

Related posts