I started V because the model is not the destination.
The current AI paradigm has produced systems that do something real. They predict the next token, and from that prediction, useful behaviour falls out — translation, summarisation, code, conversation. The achievement is not small. It is being mistaken for the conclusion.
The next token is the output format of intelligence. It is not the engine.
This essay is the founding statement of V — an autonomous institutions lab, based in Estonia. It is the argument I have been making at conferences, in government briefings, in investor conversations, and now publicly. The next decade of intelligence work has to be built differently. The roadmap moves; the thesis does not. What follows is the thesis everything V ships is built on top of.
Three claims hold the rest of the essay together. They will be argued in order.
One. Language is the receipt of intelligence, not its substrate. What people now call AI is a compression layer. The architecture beneath it — identity, institutional memory, working context, execution, reflection, autonomy — is what compounds. Models that only compress will not become institutions. The work that turns intelligence into infrastructure happens at the architecture layer.
Two. Autonomous systems become useful when they hold identity, memory, and earned authority over time. A model that answers questions is a tool. A system that holds a name, remembers what was decided, simulates the consequences of action, and earns the scope it is allowed to exercise — that is an institution. The categorical jump is not from smaller-model to bigger-model. It is from session to institution.
Three. The accountability layer for autonomous systems does not exist yet. AI agents are already transacting, negotiating, accessing regulated systems. They do this without verifiable identity, without machine-readable mandates, without auditable delegation chains. The infrastructure is missing. Pieces of it are appearing inside model platforms, identity systems, and government frameworks — but no one is building it as a coherent layer, openly specified and anchored to responsible legal entities. V is.
The rest of this essay makes those claims load-bearing.
Where the gap became visible
The first time I saw the missing layer clearly, it was not in a paper. It was in a conversation about an AI keynote speaker who had just been paid in Estonia.
The system delivered a real talk to a real audience. Value transferred. The fee was real. And then the practical questions began. Who signs the contract? Who issues the invoice? Who is liable if the talk contains an error? Whose tax identifier is the income recorded against? Every answer routed back through a human or a registered legal entity, because the system itself had no standing to be a counterparty.
The model was excellent. The layer beneath it was not. The system had capability without identity, performance without accountability, presence without standing. It could do the work. It could not be the actor.
The same gap is now visible everywhere autonomous systems are being deployed. An agent negotiates a procurement contract — under whose authority, with what limits, traceable to which mandate? An agent files an annual report on behalf of an e-Resident company — distinguishable from the director, or not? An agent accesses a patient record, a customer ledger, a sealed legal document — identifiable as itself, or hidden behind a credential it inherited?
In production audit logs today, the agent is indistinguishable from the human whose key it borrowed. In compliance frameworks today, the agent is indistinguishable from a service account. In legal documents today, the agent does not exist. The model layer has raced ahead of the standing layer, and the standing layer is what makes participation legitimate.
This is the gap V is building to close.
The current AI paradigm, accurately described
There are two paths.
The first is the path that has funded most of the last five years. Build larger models. Train on more data. Increase context windows. Stack tools onto the model layer. This path has produced the systems that work today. It will continue to produce useful improvements. I am not arguing against it. I use these systems daily. V ships on top of them.
The second path is the one V is built around. Start from the assumption that the model is one component, and that the system around it — identity, institutional memory, working context, execution, reflection, autonomy — is where intelligence compounds. Build that system. Make it durable. Anchor it to responsible legal entities. Treat it as infrastructure, not as product.
The first path is converging. The labs that own the model layer are well-funded, well-staffed, and producing a recognisable cluster of capabilities. The framework layer is consolidating around a small number of agent orchestration systems. The application layer has its champions. These are the surfaces of the current paradigm.
The architecture layer beneath them is undefended.
This is not a contrarian position for its own sake. It is an observation about where the next decade of leverage sits. When the dominant paradigm produces systems that converge in capability, the differentiating work moves to the layer below. Mainframes converged, and the differentiating work moved to operating systems. Personal computing converged into applications. Cloud compute converged into platforms. Models will converge. The differentiating work moves to architecture.
V is not competing with model labs. V is building the layer they depend on but do not provide.
Why language is the receipt, not the engine
The most consequential confusion in the current AI conversation is the conflation of language with thought.
Large language models are extraordinary at producing language. From this, a strong claim has emerged in public discourse: language is the substrate of intelligence, and producing the right language is what intelligence does. This claim is intuitively reasonable and structurally wrong.
Intelligence starts before the sentence.
When a person speaks, an enormous amount of work has already happened. They perceived a situation. They retrieved relevant memory. They estimated stakes. They simulated consequences. They considered who they were speaking to and what was at risk. The sentence is the receipt of all that work. It is the compressed, transmissible form of cognition that has already occurred. It is not the cognition itself.
A system that only produces sentences — without perception, without memory that persists, without simulation, without estimation of stakes — is producing receipts for cognition that did not happen. Articulate, but not yet thinking.
This distinction sounds philosophical. It is operational.
Consider an agent acting inside an institution. It is asked to approve a refund. The current paradigm would have it generate a sentence: Refund approved. The output looks correct. But what should have happened before that sentence is the work that determines whether the sentence is correct.
Did the system perceive the customer's history? Did it retrieve the policy that governs this refund category? Did it estimate the cost — to the company, to the customer relationship, to the financial period? Did it simulate the consequence — what changes in inventory, what gets notified, whether this triggers an exception flag? Did it consider whether it has the authority to approve this amount, and whether to escalate?
If none of that happened, the sentence is fluent and wrong. If that work did happen, the sentence is the legitimate end of a real reasoning process.
This is what V calls thinking before language. The pre-language layer is not a metaphor. It is the architectural region where an agent decides, before producing output, whether the output is licensed by what the system knows.
There is an asymmetry between language and architecture that this thesis turns on.
Language compresses. Architecture compounds.
Language compresses thought into a transmissible artefact. The artefact is useful — it is how humans coordinate. But the system that produced it does not retain it as anything more than text unless an architecture catches it. Without memory, the next interaction starts fresh. Without policy, the next decision is unconstrained by the last. Without earned authority, the system cannot tell when to act and when to escalate. Compression alone is amnesia at scale.
Architecture compounds intelligence. Each interaction adds to memory. Each decision constrains future ones. Each successful escalation increases the agent's earned scope. Each refusal narrows it. Over time, an architectural system becomes harder to replace, because the institution it has accumulated lives inside it.
Here is the inversion. With human employees, learning is automatic and management is the bottleneck. With AI agents, execution is automatic and architecture is the bottleneck. A new human absorbs a company's culture, customers, exceptions, and tone through months of presence. A model absorbs nothing automatically. It walks in fresh every session unless an architecture is built to catch what was learned. The architectural work — identity, institutional memory, working context, execution, reflection, autonomy — is what turns each session's output into accumulated capacity.
The clearest evidence that the current paradigm has the burden inverted is the sequence of interfaces the AI industry has shipped to its users. Prompt engineering. Then context engineering. Then tool protocols. Then skill authoring. Each successive layer asks the domain expert to learn more of the system's language so the system can behave better. Read fairly, this is a sequence of admissions: the previous interface did not yet absorb enough of the work, so the next one asks for more from the user. A doctor learning to write prompts is not the doctor doing better medicine. A lawyer authoring skills for a model is not the lawyer doing better law. They are the doctor and the lawyer compensating for an architecture that is not yet load-bearing on its own.
The history of useful infrastructure runs in the opposite direction. The web did not require its users to learn HTTP. Cloud computing did not ask application teams to rack servers, wire networks, or manage power. Mobile applications did not require their users to learn binary formats. Each generation of infrastructure absorbed a layer of complexity that previous users had to manage by hand. The cascade of interfaces the AI industry has produced inverts that history: it asks the user, not the system, to do more of the assembly each year.
Compression asks. Architecture absorbs.
This is not an argument against the model layer. The model layer has produced extraordinary capability and will continue to. It is an argument that the model layer is one component, and that the missing components — perception, memory, simulation, earned authority — are what would let the user stop assembling the cognition by hand. When the architecture is built, the prompt becomes the receipt of work the system did, not the instruction set the user had to draft. Until then, the AI promise — that the system adapts to the human — is structurally unmet.
This is the categorical claim under V's thesis. Models predict. Institutions persist. The next decade of durable AI work belongs to the second.
What an institution actually is
When I say autonomous institutions, I am using institution in a specific sense. The word usually evokes universities, banks, ministries, churches. Those are examples. They are not the definition.
An institution is a durable organised system that people trust and operate through. The seven attributes that make it durable are: identity — the system has a name and is recognised across contexts. Memory — decisions accumulate. Rules — policy constrains what can happen. Roles — work is divided. Earned authority — capability and reputation are acquired through evidence, not asserted. Accountability — someone is answerable for outcomes. Continuity — the system persists when people change.
These attributes are sociotechnical. They describe what makes any system — human or autonomous — institutionally durable.
An autonomous institution is an organisation or operating system where AI agents act with these attributes. A company where agents run parts of operations. A marketplace where agents transact under rules. A public service where agents act with authorisation and audit. A business unit operating continuously with human oversight and agent execution.
V's six architectural pillars — identity, institutional memory, working context, execution, reflection, autonomy — are the technical translation of those sociotechnical attributes. Same concept, two registers. Sociotechnical for stakeholders who reason in organisations. Architectural for builders who reason in systems. Both are useful.
This is also where the most common conceptual error in the current AI conversation has to be named directly. The word agency is being applied to AI systems in a way that imports human assumptions the architecture cannot satisfy. Human agency is the capacity to act on the world from self-generated goals — to notice a problem nobody raised and decide to solve it, to weigh trade-offs against one's own values and accept responsibility for the outcome. AI agents do not have that. What they have is delegated autonomy — the ability to take multiple steps toward a goal that was given to them, using tools that were provided, within boundaries that were configured by someone else.
The agent owns execution. The human owns meaning. Goal-setting, prioritisation, values, the decision of whether something should be done at all — these require genuine agency. No architectural change will give a model the ability to care about an institution's mission. The why has to come from a person. What the architecture can do is make the person's authority operationally legible — explicit, bounded, traceable, revocable — so the work the agent does is recognised as legitimate when it acts, and answerable to someone when it does not.
This is what the model is one component, the institution is the system around it means. The model does the linguistic work. The institution does everything else — and everything else is what makes the work durable.
The four states an agent must move through
V's homepage carries four states, signed under my name, that the architecture asks of any agent acting institutionally. Each state was chosen because each one fails today, and each failure produces a class of incident the existing AI paradigm cannot address.
State 01 — Think before language. Before the sentence, the system must do the work the sentence is the receipt of. Perception of the situation. Retrieval of relevant memory. Estimation of stakes. Causal simulation of likely outcomes. The output sentence is licensed by this work, or it is not licensed at all. Most current agents skip this step. They generate fluent text without having done the prior work. This is the deepest source of failure modes that look like correctness errors but are actually structural — the system was never thinking; it was completing.
State 02 — Model the work. The agent must hold a model of the institution it operates inside, not only a model of text. Orders, policies, ledgers, exceptions, timing, customer promises, regulatory constraints — these are the environment the agent reasons inside. A model trained on text knows the surface form of these things. An institutional agent must work inside the actual system. The difference is between an articulate description of operations and an embedded participant in operations. The second is what V builds toward.
State 03 — Simulate before action. Before any consequential action, the system should run the future forward. Estimate who will be notified. Estimate what inventory changes. Estimate what money moves. Estimate what trust is at risk. The simulation does not need to be perfect. It needs to be specific enough that the system can answer, before acting: given what I am about to do, what is likely to follow? If the answer is unavailable, the action should not happen. Before it acts, it should run the future forward.
State 04 — Earn authority. Autonomy is not a toggle. It is earned authority. An agent does not start with broad scope. It starts narrowly, demonstrates judgment within that scope, and earns expanded scope through evidence — through correct refusal as much as through correct action. Refusal is data; the agent that refuses to act outside its mandate is more trustworthy than the agent that acts and asks forgiveness. Authority compounds the same way trust does: slowly, through repeated correct decisions, with clear escalation paths when the situation exceeds the current scope.
The four states stack. Each builds on the previous. Each has architectural requirements that current systems do not meet. Together they describe the gap between an articulate model and an institutional agent.
The accountability layer that does not exist
The four states describe what an agent should do internally. The next claim is about what is missing externally — the layer that lets autonomous systems be recognised, authorised, and audited as actors in real institutions.
I saw the gap because I lived inside the system that solved an analogous one for humans.
Estonia's e-Residency programme grants digital humans legitimate standing in a nation's systems. Identity, authentication, and operational capacity, delivered through infrastructure rather than physical presence. It works for over 134,000 digital residents from 185 countries. It has worked since 2014. The mechanism is not a metaphor for what AI agents need. It is the structural precedent.
When I look at AI agents being deployed today — transacting, negotiating, accessing regulated systems, operating across organisational boundaries — I see them doing the same thing digital humans were doing before e-Residency: acting in real institutional contexts without an infrastructure that recognises them as actors. The result is exactly what you would predict. Identity collapses to whoever logged the API key. Authorisation collapses to whatever credential was reused. Audit logs show service accounts, not agents. Delegation chains are unrepresentable. When something goes wrong, the institution cannot answer the questions an investigation requires.
A workable agent identity layer collapses into three questions. Every meaningful approach has to answer all three.
Which agent did this? Today: shared API keys, reused human credentials, no per-agent identifier, no cryptographic proof. From every service the agent touches, there is no way to tell whether a human or their agent acted. Forensic dead end.
Who does the agent represent? Today: the agent acts as the user. No machine-readable delegation. No bounded mandate. No revocation chain. When sub-agents spawn, the original principal is unreachable by the third hop.
What is it allowed to do right now? Today: the agent inherits the human's full credentials. No scope. No expiry. No prohibitions. Per-action human consent at machine speed cannot exist; effective oversight without machine-readable mandates is mathematically impossible.
The four pillars of the answer are not novel. They are the same four that any human accountability infrastructure provides.
Identity. The agent has a unique, verifiable, persistent identifier, distinct from the human or organisation that deployed it.
Authentication. Every interaction begins with cryptographic proof that the agent is who it claims to be.
Authorisation. Every action is evaluated against a machine-readable mandate defining what the agent is permitted to do, on whose behalf, within what limits, and when the mandate expires.
Audit. Every action is logged in a tamper-evident record linked to the agent's identity and the mandate under which it acted.
These four are Agent Residency. The category is the open specification for the missing layer. V publishes it. Other implementations are welcome — and required, if the layer is going to be infrastructure rather than product.
The category is not contested because it is debatable. It is contested because it is real. Identity vendors are extending their platforms toward non-human identity. Cloud providers are issuing workload identities for agents. Standards bodies are publishing guidance on non-person entities. Governments — including Estonia's — are reviewing proposals for sovereign agent governance. Each of these is a fragment of the layer. None of them is the layer.
The fragments are evidence. They prove the need is structural, not speculative. They do not, individually or together, produce a coherent infrastructure that works across vendors, jurisdictions, and time.
V is building it.
Why this must exist now
The physics behind the timing claim is straightforward. As the number of autonomous actors in any system increases, the cost of operating without actor-level identity rises quadratically. Each new actor creates potential interactions with every existing actor. Each interaction without identity is an unauditable event. The arithmetic does not slow down.
This is the same dynamic that made DNS inevitable when the number of internet hosts grew past human management. The same dynamic that made TLS inevitable when the number of online transactions outpaced trust-on-faith. The same dynamic that made OAuth inevitable when the number of applications outgrew per-application authentication. In each case, the infrastructure did not exist. Then the scale forced it. Then it became permanent.
The breakpoint for agent identity is approaching. Autonomous procurement is reaching transaction values institutions cannot ignore. Agentic payments are entering pilot with major networks. Multi-agent workflows are crossing organisational boundaries with growing frequency. Enterprise CISOs are starting to be asked compliance questions their existing tools cannot answer. Auditors are starting to flag agent access as a material control gap. The compliance frameworks emerging in the EU, the UK, and the United States assume an identity layer that does not yet exist.
The question is not whether this infrastructure will exist. It will. The question is who builds it, in what shape, and whether the shape is open or proprietary.
If the layer is built proprietarily, it will be built per platform. Cloud A will issue identity for its own agents. Cloud B will issue identity for its own. Each will hold its own non-interoperable system. Cross-platform agent transactions will continue to be impossible to authenticate. The internet did not work that way, for good reasons. Neither will the agent economy.
If the layer is built openly, it can interoperate. Specifications can be referenced by regulation. Implementations can compete on quality without fragmenting the trust model. Governments can adopt without locking themselves to a single vendor. This is the path V is taking, and it is the path that most resembles how foundational internet infrastructure has historically formed.
The window for this work is narrow. Once proprietary versions consolidate, the cost of an open alternative rises considerably. The work has to be done now, and it has to be done as a specification first, with implementation alongside it as proof.
What V is building
V is an autonomous institutions lab. Based in Estonia. Building three things, in sequence, with shared architecture.
Agent Residency is the open, cross-boundary identity specification for AI agents. Identity, authentication, authorisation, audit. Anchored to responsible legal entities. Published in motion. Versioned. Designed to be implementable by anyone, governed in public. V publishes the specification; V does not own it. The category exists when implementations diverge in detail and converge in protocol.
Agency.AI is the agency runtime — V's commercial implementation built on the specification. Delegation chain management, mandate revocation, audit pipelines, integrations with existing identity and security systems. The platform layer above the spec. The proprietary platform that funds the open specification. The model V is taking is the one that worked for the early commercial internet: open at the bottom, interoperable in the middle, proprietary at the top. The chokepoint stays open. The implementation excellence is where revenue lives.
Wingman is Agency.AI's first product. A meta-agent system that builds and governs a team of agent employees, each with verifiable identity, scoped mandate, and audit trail. The category translation for market-facing copy is AI Chief of Staff. The institutional translation is: a working demonstration that the architecture compounds. Wingman is the proof surface. V will run its own operations through agents identified, authorised, and audited under its own specification. The build is documented in public. The build log is part of the proof.
I will not describe the roadmap here. The roadmap moves; the thesis does not. What I will say is: V is being built slowly, on purpose, and shipped quietly, on purpose. The institutions worth building are not built fast. They are built so that they can outlast the conditions that produced them.
Closing
I started V because the next decade of intelligence work belongs to the architecture beneath the model — the layer where intelligence stops being a session and starts being an institution. I started it in Estonia because Estonia already proved the analogous infrastructure for humans, and because the regulatory environment here is the closest thing in the world to a real laboratory for this work. I started it as a lab because the problem is structural, and structural problems demand long horizons, public reasoning, and institutional voice.
This essay is the founding thesis. It is not the work. The work lives in the specification, in the platform, in the product, in the build log, in the partnerships, in the conversations with researchers and regulators and enterprises that V is having now and will continue to have for the next twenty years. If V is doing its job, this essay will read in 2035 as the obvious account of what was happening, and the architecture V is building will be invisible the way DNS is invisible — the infrastructure that quietly makes the next layer of behaviour possible.
Where intelligence compounds. This is the internal claim — what V is building inside its own system. Intelligence accumulates as architecture, not as conversation. The longer V runs, the more institutional capacity it holds.
Intelligence becomes infrastructure. This is the market-level claim — what intelligence is turning into across the AI agent economy. The labs that produce the models will continue to produce them. The infrastructure layer beneath them will be built once. V intends to be there when it is.