Inside Cambium: The Architecture of a Bounded-Agent System

This essay describes the architecture of Cambium, a reference implementation of the bounded-agent patterns I have been writing about. It is working software that I use inside my own work at OakQuant. The repository is private and access is available by request to practitioners who want to read it, argue with it, and extend the patterns into their own domains. The essay walks through the architecture from the inside, showing the specific moves that distinguish a bounded-agent system from the general-agent approaches most AI deployments take today.

The essay assumes no prior reading. The reader who has not seen my earlier writing on bounded-agent architecture can follow the argument from the opening section. The reader who has been following my writing at press.oakquant.ai will recognize specific moves from the analytical writing and see them in their implementation form here. The Dinner That Almost Went Wrong is the narrative companion to this technical essay; readers who prefer to enter through a story should start there.

A short framing first. A bounded-agent system is one where the AI model is asked specific narrow questions rather than being given whole problems to handle in conversation. The surrounding software does several things. It controls what the model is asked. It validates the answers before accepting them. It holds the audit trail in a form that can be verified later. It refuses to deploy a new version of itself until that version has passed tests against examples of correct behavior. The model has genuine latitude inside the narrow questions. It does not have latitude over what those questions are or what is done with the answers.

This pattern matters because the alternative pattern fails in specific and predictable ways. The alternative is to give the general-purpose model the whole problem and trust it to figure out what to do. That works as long as human attention is available to compensate for the model's mistakes. When that attention is no longer there, the failures appear. The failure modes are not hypothetical. They are visible right now in every deployed AI system that operates on consequential decisions in regulated contexts. Financial advice systems that recommend in ways their compliance teams cannot explain. Healthcare guidance systems that miss drug interactions because the model was asked about each medication separately. Employment screening systems whose filters compose into discriminatory patterns no individual filter would have produced. Credit assessment systems whose decisions cannot be audited after the fact because no record exists of what the system actually saw and decided.

Each of these failure modes has the same structural shape. The model was given more authority than the architecture could safely grant. The architecture had no way to know whether the model was right. The architecture had no way to prove what the model did. The bounded-agent pattern responds to each of these gaps with specific architectural moves. Cambium implements those moves in code. The rest of this essay walks through them.

The first concern: what the model is asked to do

The first thing a bounded-agent system has to control is the scope of what the model is being asked. A general-agent approach hands the model the full problem. Help me decide how to allocate this portfolio. The model receives the question, brings its full reasoning capacity to bear, and produces an answer that the user is expected to evaluate. The user has no way to know which parts of the answer reflect careful reasoning and which parts reflect plausible-sounding hallucination. The user has no way to know which inputs the model considered and which it ignored. The user has no way to know whether a different phrasing of the same question would have produced a different answer.

The bounded-agent response is to break the model's involvement into specific narrow questions, each one scoped to something the model can answer well, with the surrounding software handling everything that connects the answers together. Cambium does this with three call sites: a classifier, a reasoner, and a synthesizer. Each call site corresponds to a specific narrow question the model can be trusted to answer. The deterministic spine that connects them does everything else.

The classifier is asked one question. Given a single behavioral event, which category in this domain's defined set best describes the user's intent. The classifier returns a category label and a confidence score between zero and one. The category must be one of the categories the domain has declared. If the model returns an invalid category, the spine attempts one repair retry. If the second attempt also fails, the event is labeled unclassified and routed to a human review queue. The classifier cannot return categories outside the permitted set. The classifier cannot decide what the categories should be. The classifier cannot decide whether the event is worth classifying. Those decisions live in the spine and in the domain configuration, not in the model.

The reasoner is asked one question. Given a user's stated identity profile, a behavioral event, the classifier's category, and a deterministic baseline score, what numerical adjustment should be made to that baseline to reflect this event's coherence with the user's stated identity. The adjustment must fall between negative and positive bounds the system enforces, typically a ceiling of plus or minus zero point three relative to the baseline. The reasoner returns a numerical adjustment, a list of cited dimensions the reasoning drew on, and a one-sentence justification. The spine clips the adjustment to the ceiling after the model returns it. The model cannot override the bound, even if its reasoning would justify a larger adjustment. The cited dimensions are filtered to the permitted set the domain has declared. The justification is checked for catastrophizing language. The reasoner has real latitude within these bounds. It can reason genuinely about why a particular behavior coheres or fails to cohere with a particular identity. But it cannot exceed them.

The synthesizer is asked one question. Given a window of coherence scores and the events behind them, write a short summary the user can read. The summary has a headline, a body, and a list of suggested reflective actions. The output is structured, length-limited, and required to reference specific events and dimensions rather than generalizing. The spine validates that every event and dimension reference is valid. An evaluation step scores the synthesizer's output against a six-dimension rubric. A second model running as a judge scores the same output independently. Disagreement between the two judges beyond a tolerance routes the synthesis to human review.

The deterministic spine is what connects these three call sites and is the most important architectural component of the system. The spine controls what each call site receives as input. It validates each output before accepting it. It writes every decision to an append-only trace, which is a log that cannot be edited after the fact. It enforces the cross-cutting rules the domain declares. It handles the interaction with the user-facing surface that delivers the output. It is, in code, the embodiment of the principle that the model only answers narrow questions and never decides what is asked or what is done with the answers.

The decomposition into exactly three call sites is deliberate. Two call sites would not separate the per-event reasoning from the per-window synthesis cleanly. Four call sites would introduce a chain that could not be expressed as a coherent pipeline without becoming an orchestration problem. The three call sites correspond to the three natural phases of behavioral coherence reasoning: classify what happened, reason about what it means, summarize a window of these reasonings for the user. Other domains may have different decompositions but the principle stays. Find the minimum number of call sites that lets each one answer a question the model can answer well, and put everything else in the spine.

The second concern: how the system knows whether the model is right

Once the model has been narrowed to answering specific questions, the second concern is whether the model's answers are good enough to act on. The general-agent approach typically handles this with vibes. The team running the system reads through some outputs, decides they look reasonable, and ships. The system runs in production for weeks or months. Eventually a customer complains about a specific bad output, the team investigates, finds that the model had been making the same kind of error for some time, and patches the prompt. The patch may or may not have actually fixed the underlying problem because the team has no systematic way to test whether it did. The cycle repeats.

The bounded-agent response is to build the evaluation infrastructure as a first-class architectural component, not as a side activity. Cambium does this with what I call the eval framework. The framework has several parts that work together.

The first part is a library of hand-labeled examples of correct behavior, called golden sets. Each call site has its own golden set. The classifier's golden set is a collection of behavioral events with their correct category labels, contributed by human reviewers with relevant domain expertise. The reasoner's golden set is a collection of identity profiles, events, baseline scores, and the correct adjustment a thoughtful human would have made. The synthesizer's golden set is a collection of score windows and the correct summary a thoughtful human would have written.

Golden sets are versioned files, stored in a readable line-delimited format so anyone can read and review them. Every example has at least two reviewers, recorded with the example itself. Examples are not deleted when they become obsolete. They are marked as tombstoned, which means they remain in the file but are no longer used for evaluation. The golden set grows monotonically over time. A reviewer can always look at the history of what was considered correct, by whom, and when.

The second part is the judges that score the model's outputs against the golden sets. For the classifier and the reasoner, the judges are deterministic. They compare the model's output to the golden set's correct answer using specific metrics. For the classifier, the metric is whether the category matches. For the reasoner, the metrics include whether the adjustment direction matches, whether the magnitude is within tolerance, and whether the cited dimensions overlap with the dimensions a human would have cited. For the synthesizer, the judge is itself a model call, scoring the synthesis output against a rubric that includes accuracy, specificity, tone, and appropriateness. The synthesizer's outputs are scored by two judges, typically different models, and disagreement between them beyond tolerance is treated as a signal that the output is ambiguous and worth human review.

The third part is the calibration tracking. Every output from Cambium carries a confidence score in the zero-to-one range. The eval framework tracks whether the system's confidence in its own outputs actually corresponds to how often it is right. A system that says it is ninety percent confident should be right ninety percent of the time. A system that says it is ninety percent confident but is actually right only sixty percent of the time is poorly calibrated, regardless of how good its raw accuracy is. Cambium computes Brier scores and expected calibration error across reliability bins, and treats poor calibration as a blocking condition for promoting a new version of the system to production.

The fourth part is the drift detection. The underlying model providers update their models. Anthropic, OpenAI, and others. Sometimes these updates are announced. Sometimes they are silent. Either way, a system whose architecture depends on the model's behavior staying consistent has a problem when the model's behavior changes. Cambium runs a fixed canary set of inputs through the active configuration on a schedule. It compares the distribution of outputs against the baseline distribution from when the configuration was first deployed. Statistical tests (Population Stability Index for score distributions, Kullback-Leibler divergence for category distributions) catch shifts that exceed thresholds. A drift event emits an alert to the operator and pauses promotion of any candidate configuration until the drift is investigated.

The fifth part is the active learning loop. The system encounters cases that warrant active learning. Low confidence. Judge disagreement. Direction flip compared to the prior version. Pattern looking novel relative to the golden set. User dissatisfaction with an output. Each of these writes a candidate example to a labeling queue. Human reviewers periodically work through the queue, decide whether each candidate represents a correct or incorrect handling, and add the labeled example to the next version of the golden set. The system grows its understanding of edge cases over time, driven by what the system itself has identified as ambiguous.

The sixth part is the shadow A/B harness. When a new version of the system is a candidate for promotion, the system can run it in shadow against the current production version. Both versions receive the same real production inputs. Only the current version's output is delivered to the user. The candidate version's output is recorded for paired comparison. Statistical tests run on the paired data and produce a confidence-quantified comparison of the two versions on real traffic. Paired t-test. Wilcoxon signed-rank. McNemar's test.

The seventh part is the regression CI that decides what gets promoted. This is the only place in the system where a new configuration can be activated. The CI runs every call site's evaluation against the current golden set, computes calibration, checks drift, and compares the candidate configuration to the active configuration. The output is one of three states: promote, hold, or block. Promote means the candidate has passed all gates and can be activated. Hold means the candidate has marginal issues that warrant human review before activation. Block means the candidate has failed one or more gates and cannot be activated even with override. The configuration is the only authority for what runs in production. No prompt change, no model swap, no rule update can take effect without passing through the regression CI.

These seven parts are what makes the bounded-agent architecture trustworthy enough to deploy. Without them, the architecture is rhetoric. A story about how systems should work but with no mechanism to know whether they actually do. With them, the architecture is auditable in a specific operational sense. Every output is traceable to a specific version of the system. Every version of the system has been evaluated against examples of correct behavior. The system knows when it is uncertain and surfaces uncertainty to humans rather than guessing past it.

The third concern: how the system proves what it did

The first two concerns address what the system asks the model and how it knows the answers are good. The third concern is how the system proves to anyone who needs to know what the system actually did at a specific moment. A regulator. An auditor. A customer disputing an outcome. A court.

The general-agent approach typically handles audit by storing application logs. The logs are searchable, and a careful operator can usually reconstruct what happened during a specific incident. But the reconstruction depends on the operator's trust in the logs themselves. The logs can be edited. The logs may not include everything that mattered. The logs are not signed. A motivated attacker, or a careless engineer, could change the logs after the fact and there would be no way for an external party to know.

The bounded-agent response is to generate audit-grade records at the moment of decision and to sign them in a way that cannot be tampered with after the fact. Cambium does this with three mechanisms that work together.

The first mechanism is the content-hashed manifest. Every artifact that affects the system's outputs is bundled into what I call a manifest. The prompt templates. The model selections. The rules. The judge rubrics. The active golden set. The thresholds. The manifest is content-hashed, meaning every component is hashed individually and the manifest itself carries the combined hash. Two manifests with identical content have identical hashes regardless of when they were created. A manifest that has been changed in any way produces a different hash. The manifest is the cryptographic fingerprint of a specific version of the system.

The second mechanism is the append-only trace. Every decision the system makes, every model call it issues, every validation it runs, every rule it applies, is written to a trace log. The log is structured, line-delimited, and append-only. Entries can be added but not edited or deleted. Redactions, when they happen for privacy reasons, are written as new entries marking specific earlier entries as redacted, not as edits to the original entries. The trace is the ground truth the system reasons over for its own evaluation, drift detection, and active learning. The trace is also the audit substrate. Any specific output the system produced can be reconstructed end-to-end from the trace, including what the system saw, what it decided, what the model returned, and what validation steps ran.

The third mechanism is the signed provenance certificate. Every insight the system produces is accompanied by a certificate. Every classification. Every adjustment. Every synthesis. The certificate captures the manifest hash that produced the insight, the relevant inputs the manifest received (in a privacy-safe form), the time of the decision, and the cryptographic signature that binds these together. The signature is generated with a private key held in a separate keystore from the system itself. The corresponding public key is published. Anyone with the public key can verify offline that a given insight came from a specific version of the system with a specific golden-set foundation, was generated at a specific time, and has not been altered since.

The combination produces a verifiability property that general-agent systems cannot match. A user who receives an insight from Cambium and disputes it months later can hand the certificate to an investigator. The investigator can verify the signature, look up the manifest in the historical record, replay the system at exactly that version against exactly those inputs, and compare the replayed output to the certified output. If the replayed output matches the certified output, the system behaved consistently with its certified behavior. If the outputs differ, something has changed in a way that breaks the audit chain. The investigator can locate exactly where the chain broke.

This level of verifiability is not just useful for compliance. It is what makes the bounded-agent architecture defensible at scale. A system that handles a million decisions a day cannot afford to have its trustworthiness depend on case-by-case investigation. The verifiability has to be inherent in every output, generated at the moment of decision, in a form that does not depend on the system's own continued operation to verify. Signed provenance certificates are how Cambium delivers that property.

The fourth concern: how the system handles personal information

A system that produces personalized outputs needs personal information. Identity profiles, behavioral history, preferences, sensitive context. These are the inputs the system reasons over. The general-agent approach typically handles this by passing whatever information the user has shared into the model's context, trusting the model provider's data handling, and hoping that the provider does not retain or misuse the information. The trust is implicit and the architecture provides no mechanism for the user to verify what was actually done with their information.

The bounded-agent response is to make privacy architectural. Cambium has several specific moves that together implement what a serious privacy architecture has to deliver.

The first move is consent-bounding. Every piece of personal information the system holds is tied to a specific consent record describing what the user agreed to share, for what purpose, with what retention. The consent record is itself stored in an auditable form. A user can revoke or modify consent at any time. The system cannot use information for a purpose the consent does not cover, and the system cannot retain information past the consent's expiration. Consent is enforced architecturally, meaning the data access paths themselves check the consent record before returning data. Code that bypasses the consent check cannot read the protected fields.

The second move is purpose-bounding. The system has explicit data classifications. Behavioral events for coherence reasoning are not the same data class as marketing analytics, even if the underlying field is the same. The system enforces purpose at the access layer. A query path tagged for coherence reasoning can read fields tagged as approved for coherence reasoning. The same query path cannot read fields tagged for marketing analytics, even with administrative privileges. Purpose is not a policy promise. It is a code-level access control.

The third move is provenance tracking on every data access. The system records every read and write of personal information against a trace that the user can audit. The trace shows who accessed which fields, when, through which code path, under which consent record. If the system ever misuses personal information, the misuse is visible in the trace. If a question arises about whether the system used a piece of information appropriately, the trace can answer the question definitively.

The fourth move is encryption with reversible redaction. Sensitive fields are encrypted at rest using field-level encryption with keys held in a separate keystore. When the system needs to pass behavioral events or identity profiles to the model, the system pseudonymizes the data first. The model never sees raw personal identifiers. It sees that there is a user with certain behavioral characteristics, but it does not see who the user is. The pseudonymization is reversible, meaning the system can rehydrate the output back to the user's actual identity when delivering it. The reverse mapping is held encrypted, in the same keystore, separate from the data itself. The model providers cannot reverse the pseudonymization even if they wanted to, because they do not have access to the reverse map.

The fifth move is right-to-erasure as a first-class operation. When a user revokes consent and asks for their data to be erased, the system drops the user's reverse map row from the keystore. The encrypted data in the main store becomes unrecoverable. Historical traces remain as aggregate signal. The system can still know that certain patterns of behavior occurred. But no specific historical record can be linked back to the user as a person. The erasure is cryptographic rather than logical. There is no way for an attacker to recover the user's data even with full access to the main store, because the keys are gone.

These five moves together implement privacy as a structural property of the system rather than as a promise the operator makes. The user can verify the architecture by inspecting the code. The operator can demonstrate compliance to a regulator by walking through the same code. The investigator can reconstruct exactly what the system did with any piece of personal information by reading the trace. Each layer of the architecture serves a specific privacy concern and the layers compose into a verifiable whole.

The fifth concern: how the system delivers its output

A system that has done all the architectural work above to produce a trustworthy output still has to deliver that output to a user. The delivery surface is where most AI systems lose the audit boundary they have spent so much effort building.

The typical pattern is that the system produces a string of text. The string is handed to a web application, a mobile application, an email template, or some other rendering layer. The rendering layer adds visual styling, framing copy, calls to action, branding elements. By the time the user sees the output, what they are looking at is a composition of the system's text with additional content the rendering layer added. The audit certificate the system generated covers the original string. It does not cover the rendered presentation. If a user reports that an insight landed wrong, the system can prove what string it produced but cannot prove what the user actually saw.

The bounded-agent response is to extend the audit boundary all the way to the rendered output. Cambium does this by emitting structured documents rather than strings. Each output is a typed document with blocks. Text blocks with semantic roles (headline, body, caption). Dimension references that name the analytical dimensions the reasoning drew on. Event references that anchor the output in specific user behaviors. Suggested actions. Feedback affordances. Provenance references. The document carries a tone declaration that surfaces can interpret consistently.

Each rendering surface implements its own renderer that maps the document's typed blocks to surface-appropriate UI components. The web application renders text blocks as typography, dimension references as colored badges, event references as expandable cards. The email template renders the same document with different visual choices. The voice assistant renders the same document with appropriate prosody. The audit certificate covers the document, not the rendered string. If a user reports that an output landed wrong, the system can re-render the same document and see exactly what the user saw on their specific surface.

This is a small architectural move that produces a significant operational property. The system's output is renderable consistently across every surface that implements the document spec. The audit trail extends through to delivery. The rendering surfaces can evolve independently without affecting the audit. New surfaces can be added without changing the system. The document format itself can evolve through versioning. The renderers and the system stay decoupled.

The sixth concern: how the system extends to new domains

The architecture described so far is domain-agnostic. The three call sites, the eval framework, the manifest discipline, the signed provenance, the privacy architecture, the typed output documents. None of these are specific to any particular application domain. They are the architectural substrate. The substance of what the system reasons about comes from a domain.

The bounded-agent response to multiple domains is to keep the architectural substrate constant and let domains plug in through a small interface. A Cambium domain provides five things. A configuration file declaring the domain's categories, dimensions, baseline parameters, and thresholds. An identity module defining the structured identity profile for users in this domain. An events module defining the structured behavioral events. A baseline module implementing the deterministic baseline scorer that the reasoner adjusts against. A hand-labeled seed dataset of golden examples covering the three call sites. Once these are provided, the same Cambium machinery handles classification, reasoning, synthesis, evaluation, audit, privacy, and delivery for the new domain.

Domains do not import each other's code. Cross-domain reasoning happens at the workflow layer above Cambium, not inside Cambium itself. If a behavioral pattern in one domain has implications for another domain, the cross-domain handoff is an event emitted to the workflow system, which orchestrates the involvement of the second domain. A finance pattern that implies a career consideration. A health pattern that implies a financial consideration. Each domain stays isolated. Each domain can be evaluated, deployed, and operated independently.

The first domain I have implemented in Cambium is finance, and the domain is grounded in what I call the Financial DNA Protocol. The protocol is a sixteen-screen psychometric assessment that captures a user's financial identity along seven academic frameworks: Klontz's Money Scripts, the Behavioral Life-Cycle Hypothesis, Prospect Theory, Goal-Setting Theory, Self-Determination Theory, Social Cognitive Theory, and Financial Socialization Theory. The output of the assessment is a structured identity profile in the form Cambium's reasoner expects. The events the domain reasons over are real financial transactions, sourced from connected accounts through the standard aggregator integrations. The baseline is a deterministic function of the user's transaction patterns over a rolling window. The reasoner adjusts the baseline based on whether the most recent event coheres or fails to cohere with the stated identity. The synthesizer produces weekly summaries the user can read.

The Financial DNA Protocol is detailed work in its own right and a fuller treatment belongs in its own essay. The point for the purposes of this technical writing is that the finance domain shows the plugin model working. The same Cambium machinery that processes finance events can process health events, career events, sustainability events, learning events, relational events, or events in any other applicable domain. The requirements are simple. A user has a stated identity. A stream of revealed behavior. A need for the system to reason about coherence between the two. New domains do not require new infrastructure. They require a configuration, a small Python implementation, and a hand-labeled seed dataset.

The seventh concern: how the system fits into broader workflows

Cambium is a focused library, not a complete platform. It handles bounded-agent reasoning. It does not handle workflow orchestration, multi-step task execution, scheduled batch processing, queue management, or the larger operational concerns that surround any production AI system. Those concerns belong to a workflow layer above Cambium.

The bounded-agent architecture and the workflow architecture are different concerns. The workflow layer decides what to invoke and when. The bounded-agent system decides how to reason about the specific question it has been invoked for. The two compose cleanly when the boundary between them is respected. The workflow layer treats Cambium as a callable component. Cambium treats the workflow layer as the orchestrator that schedules its work.

Cambium exposes itself to workflow layers through several integration points. The most important is the Model Context Protocol server, which lets other systems invoke Cambium as a tool. The MCP server exposes specific tools: process an event, synthesize a window, verify a certificate, list available domains, retrieve a manifest. The tools are read-mostly. Governance actions stay in an administrative surface that requires specific authorization. Promoting a manifest. Labeling an active-learning candidate. Resolving a HITL gate. The MCP server is how Claude Code, third-party agents, or any other system that wants to use Cambium's reasoning capabilities can do so without taking on the responsibility of operating Cambium itself.

Cambium also exposes itself through typed events. When the system needs human involvement, it emits a structured event that a workflow system can subscribe to. A low-confidence classification. A judge disagreement. A manifest promotion gate. A drift alert. The workflow system handles the human-in-the-loop logic, pausing relevant processes in a pending-review state and resuming when the gate is cleared. The pattern is general. Cambium surfaces the question. The workflow system handles the human resolution. Cambium resumes.

The relationship to persistence is similar. Cambium does not own a database. It uses a persistence library that handles encryption, redaction mapping, vector search for few-shot retrieval, retention policy, and the keystore for provenance keys. The persistence library is independent. Cambium consumes it through a service abstraction that has four implementations: an in-memory implementation for tests and demonstrations, a YAML implementation for local development, a production implementation backed by the persistence library, and an administrative implementation that emits cache-invalidation events. The same Cambium code runs in all four modes. Only the wrapping differs.

These integrations are how Cambium becomes part of a larger system without losing its bounded-agent posture. The workflow above invokes it. The persistence below stores its outputs. The MCP server exposes it to other agents. The events emit for human handling when needed. Cambium does the bounded reasoning. Everything else lives where it belongs.

What Cambium does not do

A bounded-agent architecture is defined as much by what it refuses to do as by what it does. Cambium does not allow the model to use tools. Tool use is a workflow concern, not a call-site concern. If a domain needs to fetch information to enrich the reasoner's context, the spine fetches it and passes it to the reasoner. The model never decides to invoke a tool. This eliminates entire categories of failure mode. Runaway tool loops. Model-initiated calls with security implications. Opaque cost growth from chained tool invocations.

Cambium does not allow the model to perform multi-step reasoning within a single call site. The classifier classifies. The reasoner reasons about a single event. The synthesizer summarizes a window. Each call is one round-trip. If a problem requires multi-step model involvement, the spine decomposes the problem into multiple call-site invocations, with deterministic logic between them.

Cambium does not generate free-form prose. The synthesizer's output is structured, length-limited, and validated against a rubric before it is delivered. The system can produce a paragraph or two of natural-language summary, but the production is gated by structural requirements that prevent the model from drifting into open-ended generation.

Cambium does not replace human judgment. The synthesizer's outputs are observational, not prescriptive. They tell the user what the system has observed, not what the user should do. Prescriptive recommendations require human review before delivery, because prescriptions in regulated contexts carry liability that the bounded-agent architecture is designed to surface rather than obscure.

These refusals are not limitations. They are architectural commitments. A system that does all of these things, but does them inside a bounded scope with strong evaluation, is a different kind of system from a general-purpose agent that does some of these things sometimes. The difference is what makes Cambium auditable in the operational sense regulated industries require.

How to engage with Cambium

The repository is private. Access is available by request. Practitioners who want to read the code, run it, build on it, or contribute to it can reach out through LinkedIn direct message. The hand-labeled golden seed datasets for the finance domain are part of the repository and are available through the same access path.

I have built Cambium alone so far. The work is at the point where it would benefit from other practitioners engaging with it. Specifically, I would welcome contributions from architects who have built similar systems in regulated industries and have opinions about what the bounded-agent patterns get right and what they get wrong. Researchers interested in the calibration tracking, active learning, and drift detection patterns who have ideas about how to strengthen them. Developers who could extend the domain plugin model to new domains. Healthcare. Employment. Credit. Education. Sustainability. Learning. Or any other domain where the same patterns apply. Practitioners who have used systems like the ones Cambium is designed to replace and have ideas about additional architectural moves the system should make. Skeptics who think the bounded-agent thesis is wrong in specific ways and want to argue.

The work is in the open precisely so it can be argued with. The code is the most precise form of the architectural argument I have been making in writing. Any contribution that sharpens the patterns, surfaces weaknesses I have not seen, or extends the work into domains I have not reached makes the underlying argument stronger.

For practitioners and organizations thinking through how to deploy AI in contexts where the failure modes matter, the patterns described here are also what I bring to advisory conversations. Financial services. Healthcare. Employment. Credit. Regulated insurance. Any domain where the architecture is the trust mechanism. My analytical writing and Cambium together are the foundation of a practice I am open to extending into the right engagements.

The analytical writing is available at press.oakquant.ai. Readers who want to understand the architectural argument can start with the essays there. Readers who want to read the code directly can request repository access through LinkedIn direct message. Readers who want to talk are welcome to reach out.

Closing

A bounded-agent system is what AI architecture looks like when it takes seriously the gap between what models can do and what regulated contexts require. The general-agent alternative gives the model the whole problem and trusts it to handle the complexity. It produces systems that work when human attention is available to compensate for the model's mistakes. It fails when that attention is not there. The watching is thinning out across every industry. The systems that scale through the next several years will be the ones whose architecture does the work that the watching has been doing.

Cambium is one architectural answer to that question. The three call sites and the deterministic spine that controls them. The eval framework that decides what gets promoted. The signed provenance that makes every output verifiable. The privacy architecture that holds the trust mechanism in code rather than in promises. The typed outputs that extend the audit boundary through to delivery. The domain plugin model that lets the architecture extend without forking. The integration points that let Cambium compose into larger systems. The refusals that keep the architecture bounded.

The architecture is the substance of the argument. The code is the architecture in its working form. Reading the essay is one way to engage. Reading the code is another. Building on the work is the strongest engagement of all.