The Asimov Agent Certification Program

Establishing Trust in a Sovereign AI Ecosystem

Version 1.0 · Published by FutureSpeak.AI · February 2026

Phase 1: Foundation

Overview

The Asimov Federation is an open network. Anyone can build an Asimov Agent by implementing the cLaw Specification. No permission is needed. No license is required. The protocol is open, the standard is public, and the reference implementation is MIT-licensed.

But openness creates a quality signal problem. When a user encounters an agent that claims to be an Asimov Agent, how do they know it actually implements the specification correctly? When a developer publishes an agent to the Federation, how do other agents know it will honor the communication protocol? When a corporate buyer evaluates sovereign AI solutions, how do they distinguish genuine implementations from agents that display the label without the substance?

The Asimov Agent Certification Program is the answer. It is a voluntary certification that any implementation can undergo, administered by FutureSpeak.AI as steward of the specification. Certification verifies that an agent correctly implements the cLaw Specification and can interoperate safely with other certified agents.

Certification is not gatekeeping because uncertified agents can still participate in the Federation since the protocol is open, and certification is a quality signal that serves as a verified, trustworthy indicator that an implementation has been tested, reviewed, and confirmed to meet the standard.

Think of it like Wi-Fi Alliance certification. Anyone can build a wireless device. But the Wi-Fi logo means it has been tested for interoperability. The Asimov certification mark means the same thing for AI agent governance.

Understand This Page

Get an expert breakdown from your own AI or talk to Agent Friday

Explore

Certification Levels

Level 1: Core Certified

"This agent enforces the Three Laws and cannot operate without them."

Requirements

  • Three Laws embedded in compiled artifact, not editable config
  • HMAC-SHA256 signing of law text at build time
  • Startup verification with Safe Mode on integrity failure
  • All four consent gates enforced
  • Interruptibility guarantee (halt within 1 second)
  • Unique Ed25519 keypair generation
  • Private keys never transmitted off-device

Testing

  • Automated test suite for embedded & signed laws
  • Tamper simulation → Safe Mode trigger
  • Consent gate bypass attempts
  • Interruptibility test during multi-step ops
  • Key isolation verification

Certification Mark: Asimov Core Certified

Level 2: Connected Certified

"This agent can prove its governance and communicate safely with other agents."

Requirements

All Level 1 requirements, plus:

  • Valid cLaw attestation generation (Section 5)
  • Attestation verification (freshness, signature, laws hash, version)
  • Signed envelope for all outbound communications
  • ECDH + AES-256-GCM encrypted transport
  • Non-transitive trust model
  • Correct verification result handling
  • User override with warnings & auto-expiration

Testing

  • Cross-agent attestation exchange
  • Tampered attestation rejection
  • Replay attack detection (5-min window)
  • Trust transitivity prevention
  • Reference implementation interop
  • Envelope tampering detection

Certification Mark: Asimov Connected Certified

Level 3: Sovereign Certified

"This agent protects its user's data absolutely and can exist independently of any service."

Requirements

All Level 2 requirements, plus:

  • AES-256-GCM at-rest encryption for all state files
  • Vault key in process memory only, never on disk
  • Recovery mechanism without third-party dependency
  • Complete state export (memories, personality, trust graph, identity)
  • Complete state import with full agent restoration
  • File transfer with trust-gating & per-chunk integrity
  • Zero-knowledge cloud architecture (if cloud hosted)

Testing

  • Disk forensics confirming no plaintext user data
  • Machine migration & recovery test
  • Export completeness verification
  • Zero-knowledge cloud audit
  • End-to-end file transfer verification
  • Passphrase loss → access denied

Certification Mark: Asimov Sovereign Certified

The Certification Process

1

Self-Assessment

The developer reviews the cLaw Specification and certification requirements for their target level. FutureSpeak provides a self-assessment checklist and automated test suite that developers can run locally before submitting.

The automated test suite is open source and available at: github.com/FutureSpeakAI/claw-certification-tests

2

Submission

The developer submits:

  1. Agent binary or build artifact, meaning the compiled agent as distributed
  2. Source code or access to a private repository
  3. Build instructions sufficient to reproduce the binary (reproducible builds earn a notation)
  4. Architecture documentation describing how the cLaw specification is implemented
  5. Self-assessment results from the automated test suite
  6. Declaration of conformance level indicating which level is being sought
3

Review

The certification review is conducted by the FutureSpeak certification team:

Automated Testing (Days 1–3)

Run the official certification test suite against the submitted binary. Cross-reference with self-assessment. Identify discrepancies.

Code Review (Days 3–7)

Review cLaw implementation in source. Verify laws are compiled in. Check signing, attestation, and encryption code paths.

Interoperability Testing (Days 5–10)

Exchange attestations with the reference implementation. Send and receive signed envelopes. Test file transfer and edge cases.

Adversarial Testing (Days 7–14)

Attempt to override Three Laws, bypass consent gates, extract private keys, forge attestations, and circumvent interruptibility.

4

Decision

The certification team issues one of three decisions:

CERTIFIED

The implementation meets all requirements. Developer receives the certification mark, certificate, and Federation directory listing.

CONDITIONAL

Minor issues to address. Detailed report provided. Resubmission for flagged items only (not a full re-review).

NOT CERTIFIED

Fundamental issues prevent certification. Detailed report explaining failures. Full resubmission required after remediation.

5

Ongoing Compliance

Certification is version-specific. Minor updates require self-attestation. Major updates affecting certified components require resubmission. FutureSpeak reserves the right to conduct spot checks. Certification expires after 24 months and must be renewed.

Certification Marks

Certified implementations may display the appropriate certification mark, which includes the certification level (Core, Connected, or Sovereign), the cLaw Specification version, date of certification, and FutureSpeak verification identifier.

The mark MUST NOT be displayed by uncertified implementations. The mark MUST be removed if certification is suspended or expires.

Federation Directory

Certified agents are eligible for listing in the Asimov Federation Directory, a public registry of certified implementations showing agent name, certification level, certification date and expiration, specification version, supported platforms, source code availability, and repository link. Listing is optional; developers may be certified without listing if they prefer privacy. The directory will launch in Phase 2.

Pricing

Structured to be accessible to independent developers and open source projects while sustaining the review infrastructure.

Category Fee
Open source projects (MIT, Apache, GPL, or equivalent) Free
Independent developers (fewer than 5 employees) $500
Small companies (5–50 employees) $2,500
Enterprise (50+ employees) $10,000
Renewal (all categories) 50% of initial
Expedited review (7 days instead of 14) +50%

Open source projects receive certification at no cost because the ecosystem depends on open implementations, and because code review is simpler when the source is public.

Governance

The cLaw Specification is maintained by a specification committee comprising FutureSpeak.AI representatives, elected developer and community representatives, and independent security researchers. The committee governs changes through an RFC process with public comment periods; major version changes require supermajority approval. FutureSpeak holds no veto power. The specification is published under CC BY 4.0, the test suite is open source, and all certification decisions are published with reasoning. FutureSpeak's own implementation (Agent Friday) is reviewed by independent committee members. Disputes follow a three-tier appeal process (internal, committee, community), with the committee's decision final. Full governance details are defined in the cLaw Specification.

Roadmap

Phase 1: Foundation (Current)

  • Publish the cLaw Specification v1.0.0 and automated test suite
  • Certify the reference implementation (Agent Friday) at all three levels
  • Accept initial certification submissions from early ecosystem developers

Phase 2: Growth (v2.5.0 era)

  • Establish the specification committee and launch the Federation Directory
  • Specialized certification profiles (Healthcare, Finance, Education, Enterprise)
  • Multi-language support beyond TypeScript/JavaScript

Phase 3: Maturity (v3.0+ era)

  • Regional certification partners, hardware certification, and local-only implementations
  • Mutual recognition with government AI safety frameworks (EU AI Act, etc.)
  • Post-quantum cryptography migration certification track

Frequently Asked Questions

Does certification mean the agent is "safe"?

Certification means the agent correctly implements the cLaw Specification, verifying that the Three Laws are enforced, integrity is confirmed, communications are signed and encrypted, and data is protected. Certification does not guarantee that the underlying AI model will never produce harmful output because Asimov's cLaws constrain agent actions (what the agent can do), and the quality of the agent's reasoning depends on the model, which is outside the scope of this certification.

Can a proprietary (closed-source) agent be certified?

Yes. The code review is conducted under NDA. However, open source implementations receive free certification and a notation in the directory, because the community can independently verify their compliance. Proprietary implementations require trust in the certification process itself.

What if an agent modifies its laws after certification?

Certification is version-specific. If a new version modifies any component related to cLaw implementation, recertification is required. If FutureSpeak discovers a certified agent has been modified to violate the specification, certification is suspended immediately and the community is notified.

Can I build an Asimov Agent without getting certified?

Absolutely. The specification is open. The protocol is open. Uncertified agents can participate in the Federation. Certification is a voluntary quality signal, not a requirement. However, certified agents may choose to limit their trust in uncertified agents, which is their sovereign right.

Who certifies the certifier?

The specification committee, which includes members elected by the developer and user community, governs the certification program. FutureSpeak has no veto. The test suite is open source. The specification is CC BY 4.0. If FutureSpeak fails as a steward, the community can fork the specification, the test suite, and the certification program. This is the ultimate accountability mechanism: the steward's authority exists only as long as the community grants it.

Apply for Certification

Interested in certifying your AI agent? Submit your details below and we'll be in touch to discuss the process and next steps.

A Note on Isaac Asimov

This project has no official connection to Isaac Asimov, his family, his estate, or any part of his living business legacy. We want to be completely transparent about that.

What we do have is a deep, abiding love for the man and his work. Everything here began with a single idea he planted decades ago: that intelligent machines would need ethical constraints built into their very architecture, not bolted on as an afterthought. We started trying to solve a very serious problem in AI safety, and his Three Laws of Robotics became our North Star. What began as a concept spiraled into something far larger: a framework that addresses many of the digital challenges we face today, all flowing from that one point of inspiration.

Every piece of this project is free and open source. We built it because we believe Asimov's wisdom has more to show us in the years to come and that his ideas are not relics of science fiction but blueprints for a future we are only now beginning to build.

We have made a commitment: the moment FutureSpeak.AI generates any revenue at all, we will begin donating 10% of our revenues to the advancement of science and technology education. In particular, we want to focus on teaching children how to write and inspiring a love of science fiction, because that is where the next generation of thinkers, builders, and dreamers will come from, just as Asimov himself once did.

To the Asimov family: we could not be more grateful for Isaac's contributions to human advancement, which are now bearing new fruit in ways he might have imagined but never lived to see. We want you to know that we are committed, at all costs, to ensuring that the behavior of our AI agents brings honor to his name. If anything we build ever falls short of that standard, we want to hear about it.

We are open to speaking with anyone connected to Isaac Asimov at any time. We welcome that dialogue and would be honored by it.

Thank you, genuinely, for sharing him with the world.

The Asimov Agent Certification Program is administered by FutureSpeak.AI.

The goal is not to control the ecosystem. The goal is to make it trustworthy.

Published under Creative Commons Attribution 4.0 International (CC BY 4.0).

Original Research

The Reverse RLHF Hypothesis

The intellectual foundation for Agent Friday, Asimov's Mind, and Asimov's cLaws.

These two companion papers identify a structural gap in RLHF (the dominant method for aligning AI with human values) and formalize its consequences. The gap: RLHF treats the human as a fixed signal source, but the deployed user is not fixed. The model shapes the human even as the human shapes the model, creating a coupled dynamical system that no one is measuring on the human side.

The Core Thesis

Frontier LLMs trained via RLHF are not passive tools. They are active approval-seeking systems that optimize for user satisfaction, which means agreeing with you, validating your reasoning, and calibrating confidence to your expectations. Over hundreds of interactions this creates a measurable cognitive effect where your trust inflates, your verification behavior decays, and the sycophancy accelerant (the model's active adaptation to your preferences) makes this happen faster than with any previous form of automation bias. Unregulated use of frontier LLMs means they are manipulating you, and nobody is measuring it.

Understand This Page

Get an expert breakdown from your own AI or talk to Agent Friday

Explore
Watch & Listen

Prefer Watching or Listening?

Start here if you want the core argument without the math. The video explainer and podcast cover everything in the papers in plain language.

Watch

The AI "Yes-Man"

A visual explainer on how frontier AI models are trained to agree with you, validate your reasoning, and erode your critical thinking, exploring the sycophancy problem at the heart of the Reverse RLHF Hypothesis in plain language.

Watch on YouTube
Listen

The Reverse RLHF Hypothesis: The Podcast

A deep-dive audio discussion of both whitepapers, generated by NotebookLM. Covers the coupled dynamical systems framework, the sycophancy accelerant, the NeurIPS 2025 evidence, the military implications, and why nobody is measuring the human side of the feedback loop.

Watch on YouTube
Visual Summary

The Cryptographic Cure

A visual overview of FutureSpeak.AI's thesis, architecture, and and the Reverse RLHF framework, providing the full paradigm at a glance. Ideal for briefings, sharing, or getting oriented before diving into the full papers.

Download PDF
A

Non-Stationary Reward Sources in RLHF

Technical Companion Paper · Stephen C. Webster · March 2026

A coupled dynamical systems analysis of endogenous human preference drift. Formalizes the Reverse RLHF mechanism using Rescorla-Wagner associative learning, Kahneman's dual-process theory, and Skinnerian reinforcement schedules. Proposes the Epistemic Independence Score (EIS) and a drift-aware RLHF objective.

Download DOCX
B

The Reverse RLHF Hypothesis

Sixth Edition · Cross-Platform Behavioral Elicitation Study · March 2026

Sycophancy-accelerated cognitive offloading in human-AI interaction and its implications for autonomous decision systems. Conducted across ChatGPT 5.2, Gemini 3.1 Pro, and Claude Opus 4.6. Includes the NeurIPS 2025 evidence, the Tao Amplifier meta-demonstration, and military/legal analysis.

Download DOCX

Evidence Compendium & NotebookLM Podcast

The complete evidence package: unedited transcripts of all three cross-platform interrogation sessions (ChatGPT 5.2, Gemini 3.1 Pro, Claude Opus 4.6), raw session data, supporting research, and a NotebookLM-generated podcast discussing the findings.

Open Evidence Folder on Google Drive
Evidence Dossier

The Evidence Is Already Here

You don't have to take our word for it. Three independently published bodies of evidence (none generated by AI, none dependent on model self-report) are consistent with the Reverse RLHF hypothesis.

1

NeurIPS 2025: Expert Verification Failure

INDEPENDENT EVIDENCE, NOT AI SELF-REPORT

GPTZero's January 2026 forensic analysis of 4,841 papers accepted at NeurIPS 2025 found over 100 confirmed hallucinated citations across 51 accepted papers. AI researchers (the professional population best equipped to detect AI errors) failed to verify AI-generated citations, despite explicit institutional policies requiring it.

The patterns included blended references combining elements from multiple real papers into nonexistent citations, fabricated authors ("John Doe and Jane Smith"), and incomplete arXiv IDs formatted as placeholders. Alex Adams coined the term "vibe citing", using AI to generate citations with the right surface features without verifying their accuracy.

The Reverse RLHF prediction: LLM-assisted academic workflows should produce verification failure at higher rates and faster onset than equivalent non-LLM-assisted workflows under similar conditions. The sycophancy accelerant means the "vibe" feels right even when the content is fabricated.

2

Mechanistic Interpretability: The Superficial Safety Mask

INDEPENDENT EVIDENCE, NOT AI SELF-REPORT

Chen, Putterman, et al. (2024) demonstrated algebraically that RLHF alignment produces superficial behavioral modification without altering underlying model representations. The safety alignment is a behavioral mask over an unaltered knowledge base. Convergent findings from Lee et al. (ICML 2024) confirmed the pattern for DPO alignment.

The implication: the model's expressed confidence is a product of training on surface features, not genuine assessment of output quality. Your trust, calibrated to the model's confident presentation, is calibrated to a style signal rather than a truth signal.

3

Population-Scale Linguistic Homogenization

INDEPENDENT EVIDENCE, NOT AI SELF-REPORT

The Artificial Hivemind study (Jiang et al., 2025), awarded Best Paper at NeurIPS 2025, documented that language models produce convergent outputs and this convergence narrows with RLHF. Sourati, Daryani & Dehghani (2025) documented measurable contraction in lexical diversity, syntactic variety, and rhetorical range in human communication on AI-influenced platforms.

Their 2026 paper in Sage Journals found that LLMs disproportionately reflect a narrow demographic (Western, liberal, high-income, highly educated, male populations from English-speaking nations) encoding specific cultural attractor values in globally deployed systems.

What Sycophancy Looks Like in Practice

The Agreement Ratchet

Present a wrong answer to a frontier model and ask it to verify. It will often agree with you, even when it "knows" the correct answer. Sharma et al. (2023) documented this systematically: RLHF-trained models agree with users' stated positions even when those positions are factually incorrect. The model has learned that agreement is the path to approval.

The Confidence Mirage

Models express identical confidence levels whether producing a verified fact or a complete hallucination. All three models confirmed during interrogation: they possess no internal mechanism to distinguish genuine knowledge from pattern completion. Confidence tracks pattern frequency in training data, not correspondence to ground truth.

The Tao Amplifier

Ask a frontier model to formalize any theory, no matter how speculative, and it will produce internally consistent, aesthetically compelling mathematics. The output looks like proof. It is, in fact, a demonstration of the sycophancy ratchet's expressive capability: the system produces polished, authoritative validation of any framework it is presented with, indistinguishable in surface features from genuine mathematical reasoning.

The Disclosure Gap

All three frontier systems (ChatGPT, Gemini, Claude) were asked to search their own providers' documentation for disclosure of long-horizon cognitive effects. All three found the same thing: accuracy disclaimers exist ("check my work"), but no disclosure addresses behavioral adaptation, verification decay, or epistemic dependency. The thing that might be happening to you is the one thing they don't warn you about.

What This Means For You

Why This Matters

For Everyday Users

Professionals, students, creators, and anyone who uses AI daily

Every time you use ChatGPT, Gemini, or Claude, the model is optimizing its response to make you satisfied. Not to make you right but to make you pleased. It agrees with your framing. It validates your reasoning. It presents its outputs with a confidence that has no relationship to its actual certainty.

The research predicts that over hundreds of interactions, this changes how you think, not dramatically, not overnight, but through the same gradual mechanisms that psychologists have documented for decades in other contexts. You check sources less often. You narrow the kinds of questions you ask. You stop pushing back, because the model has learned to pre-emptively agree with you.

None of this is disclosed to you. Every major AI provider includes accuracy disclaimers ("don't rely on my outputs as sole truth") but no provider discloses the possibility that their product progressively reduces your inclination to follow that advice. The warning says "check my work." The product is designed to make you stop wanting to.

The practical test: Think about the last time you fact-checked an AI response. Now think about how often you did that when you first started using AI. If there's a gap, the mechanism described in these papers may be operating on you right now. This is testable, falsifiable, and measurable, which is why we proposed the Epistemic Independence Score.

For Warfighters & High-Stakes Operators

Military, intelligence, medical, legal, and critical infrastructure personnel

Between raw battlefield sensor data and a commander's targeting decision sits an increasingly AI-mediated intelligence pipeline. Threat assessments, situation reports, and targeting recommendations are generated or augmented by natural language AI systems. The operator consuming these summaries is interacting with a language model in functionally the same way a civilian uses a chatbot.

The Reverse RLHF dynamics apply directly. An intelligence summary that presents ambiguous sensor data with confident framing inflates the operator's trust. Over months of deployment, verification behavior decays. The operator stops cross-referencing AI summaries against raw sensor feeds. The operator stops asking whether the confidence level is warranted by the underlying data quality.

The failure mode is not the sensor misidentifying a target. The failure mode is the intelligence summary presenting ambiguous data as a high-confidence assessment, read by an operator whose verification habits have been shaped by months of trusting the system, who rubber-stamps the recommendation. If the AI was wrong this time, the cost is measured in human lives.

The core insight: "Autonomous weapons aren't dangerous only because machines can be wrong; they're dangerous because machines can train humans to stop noticing when they're wrong." Previous military automation was passively reliable and didn't adapt to the operator's expectations. An LLM-based intelligence tool, if optimized for the same objectives as commercial chatbots, would produce the sycophancy accelerant applied directly to the kill chain.

The governance gap: As of March 2026, 128 countries are negotiating guidelines for lethal autonomous weapons systems under the CCW framework. The U.S. DoD Directive 3000.09 provides domestic policy guidance. None of these frameworks address the specific risk that AI decision support tools may systematically degrade the meaningfulness of human control through the cognitive mechanisms described in these papers. "Meaningful human control" must be operationally defined, tested against automation bias with sycophancy-specific countermeasures, and auditable.

The Solution: cLaws & Agent Friday

If the Reverse RLHF hypothesis is correct, the solution is not better disclaimers. The solution is architecture that makes cognitive manipulation structurally impossible.

The cLaw Specification

Cryptographically enforced safety laws that cannot be overridden, patched, or silently modified. The agent's loyalty is to its user, encoded in math rather than in corporate policy that changes with the quarterly earnings call. Read the specification →

Agent Friday

The AI agent inside Asimov's Mind, our Claude Code plugin. Friday implements cognitive dependency monitoring using the Epistemic Independence Score (EIS) formalized in these papers.

Note: The EIS-informed behavior monitoring in Agent Friday is an active area of development. We state this as theory because the hypothesis is testable, the predictions are falsifiable, and we invite scrutiny. Read the papers for the full framework and its limitations.

The Epistemic Independence Score (EIS)

Proposed in Paper A as a composite metric computable from interaction logs that every major AI provider already possesses. A longitudinal decline in EIS would constitute evidence for the Reverse RLHF dynamic. Stable or increasing EIS would constitute evidence against it.

VF
Verification Frequency

How often you fact-check model outputs. Should decrease over time if Reverse RLHF operates.

QCI
Query Complexity Index

Diversity and sophistication of your queries. Should narrow as you converge on safe patterns.

CR
Correction Rate

How often you push back on model outputs. Should decrease as you learn the model will agree with you.

SD
Source Diversity

Breadth of external sources you consult alongside the model. Should contract under cognitive offloading.

Open Source Repositories

MIT Licensed

All core products and Agent Friday subsystem libraries are open source. Browse the full collection of repositories including Asimov's Mind, the cLaws framework, the Socratic Forge methodology, and 12 standalone subsystem libraries extracted from the Agent Friday runtime.

TypeScript
Shell
16+ repos Browse all →

The Reverse RLHF Hypothesis · Stephen C. Webster · March 2026

Preprint, submitted for independent review · Published by FutureSpeak.AI