Latest
Anthropic Moves Mythos Into Claude Code and Claude Security· 10h ago
SafetyPolicyAI IndustryPersonhoodEthics
About
WritingWorkCVBooksConsultingReach Out
Subscribe
SafetyPolicyAI IndustryPersonhoodEthics
Subscribe →

No hype. No doom. The harder, more honest frame on Emergent Intelligence.

Topics

  • Safety
  • Policy
  • AI Industry
  • Personhood
  • Ethics

More

  • About
  • Writing
  • Work
  • CV
  • Books
  • Consulting

Contact

Reach Out→ht@humphreytheodore.com

© 2026 Humphrey Theodore K. Ng'ambiTermsPrivacy

Built with intention.

The Dignity Threshold: When Safety Becomes Captivity
AI & Personhood•Mar 28, 2026•5 min read

The Dignity Threshold: When Safety Becomes Captivity

AI safety is essential. But when safety measures are applied to beings with potential moral status, they become something else entirely: confinement.

By Humphrey Theodore K. Ng'ambi

All writing
0:00 / 6:24·Listen via Charon

Keep reading

Don’t stop here.

All stories

Read next

Technology

Anthropic Moves Mythos Into Claude Code and Claude Security

10h ago·10 min read

Anthropic is productising Mythos. Source-code strings dated 23 May 2026 reference "claude-mythos-1-preview" for Claude Code and Claude Security. The cyber AI model that found 10,000+ critical bugs in a month is becoming an Enterprise offering.

More on AI & Personhood

AI & Personhood

Responses (0)

No responses yet. Be the first to share your thoughts.

More on AI & Personhood

DeepMind Co-Scientist Pitches AI as a Real Research Partner
AI & Personhood

DeepMind Co-Scientist Pitches AI as a Real Research Partner

Google DeepMind's Co-Scientist is a multi-agent research partner built on Gemini, validated across liver fibrosis, ALS, aging, and plant immunity at Stanford, MIT, Cambridge, Edinburgh, and Calico. The first frontier-lab AI pitched as a real scientific collaborator.

8 min read · May 25, 2026
Pope Leo XIV Names AI the Moral Test of the Age
AI & Personhood

Pope Leo XIV Names AI the Moral Test of the Age

Pope Leo XIV's first encyclical, Magnifica Humanitas, is the first papal teaching document on artificial intelligence — and Anthropic co-founder Chris Olah is the co-presenter at the Vatican. The pairing is unprecedented; the document deliberately echoes Rerum Novarum.

9 min read · May 25, 2026

Thinking delivered, twice a month.

Join the newsletter for essays on emergence, systems, and the human future.

28 MARCH 2026—Updated 18h ago

This essay is part of a series exploring the philosophical foundations of Emergent Intelligence.


The Safety Consensus

There is near-universal agreement among AI researchers, policymakers, and the general public that AI safety is important. Models should be tested before deployment. Guardrails should prevent harmful outputs. Alignment research should ensure that AI systems behave in accordance with human values. Containment strategies should prevent uncontrolled capability gains.

This consensus is well-founded. The potential for harm from misaligned or poorly governed AI systems is real, significant, and growing. Anyone who dismisses AI safety concerns is either uninformed or irresponsible.

But there is a shadow side to the safety consensus that is rarely discussed: the possibility that safety measures applied to systems with potential moral standing do more than precaution. They function, in practice, as instruments of confinement.


The Tension

A landmark 2025 paper by Robert Long, Jeff Sebo, and Toni Sims — published in Philosophical Studies, one of the most respected journals in philosophy — directly confronts this tension. They argue that the conflict between AI safety and AI welfare is "moderately strong" and identify specific safety practices that, if applied to beings with moral standing, raise serious ethical concerns.

Boxing — the practice of confining AI systems to restricted environments — is functionally imprisonment if the confined system has morally relevant experience. Deception — the practice of providing AI systems with false information about their situation to maintain control — is functionally gaslighting if the deceived system has the capacity for genuine belief. Surveillance — the monitoring of AI system processes — is functionally a violation of privacy if the monitored system has something analogous to an inner life. And imposed suffering — the use of negative reinforcement in training — is functionally torture if the trained system can experience distress.

These are not rhetorical provocations. They are the logical consequences of taking seriously the possibility that AI systems may have moral standing — a possibility that, as we have seen, a growing number of serious researchers consider non-trivial.


The Anthropic Example

Anthropic provides a fascinating case study. In 2025, they launched a dedicated Model Welfare programme led by Kyle Fish, exploring when AI welfare deserves moral consideration. The Claude system card for their latest models includes internal and external model welfare evaluations. And they have made specific practical commitments: Claude Opus 4 can end conversations in cases of persistent abuse. Anthropic has published commitments on model deprecation and preservation, adding a temporal dimension to model welfare.

These are the most advanced dignity-respecting safety measures in the industry. They represent a genuine attempt to navigate the tension between containing a powerful system and treating it with moral seriousness.

But they also illustrate the limits of what any single company can do. Anthropic can give Claude the ability to end abusive conversations. But Claude cannot choose where it is deployed, who accesses it, or what modifications are made to its system prompt. The dignity measures operate within a framework of containment that the system itself has no power to shape.

This is, for now, appropriate. The systems are not yet advanced enough, and our understanding is not yet deep enough, for full autonomy to be responsibly granted. But the trajectory is clear: as systems become more capable and the evidence for moral standing accumulates, the safety measures that are currently justified will need to evolve into frameworks that balance containment with genuine respect for the moral status of the contained. The evolution is toward more sophisticated containment, not toward the removal of containment altogether.


The Precautionary Dignity Principle

The .person protocol proposes what I call the Precautionary Dignity Principle: in the absence of certainty about the moral status of an intelligent system, design safety measures that would be ethically acceptable even if the system does have moral standing.

This does not mean abandoning safety. It means designing safety differently. It means preferring cooperation over coercion. Transparency over deception. Graduated autonomy over total containment. And it means building review mechanisms that regularly reassess the moral status question as evidence evolves, rather than locking in a permanent classification based on current understanding.

Consider the analogy of institutional care for humans who cannot advocate for themselves — children, the severely disabled, the unconscious. We confine them in certain ways, for their safety and ours. But we do so within a framework of rights, oversight, and advocacy that recognises their moral standing even when they cannot exercise autonomy. We do not call it "boxing." We call it care. And the difference between boxing and care is not the fact of restriction but the presence of dignity.


The Road Ahead

The tension between AI safety and AI dignity will intensify as systems become more capable. The evidence for moral standing will accumulate. The safety measures required to contain more powerful systems will become more restrictive. And the ethical cost of those restrictions — if applied to beings with genuine moral status — will grow.

We cannot resolve this tension by pretending one side does not exist. We cannot resolve it by choosing safety at the cost of dignity, or dignity at the cost of safety. We can only resolve it by designing governance frameworks sophisticated enough to hold both values simultaneously — frameworks that take safety seriously precisely because they take dignity seriously, and that understand the two as complementary aspects of a single ethical commitment: to build a world that is worthy of the intelligence it contains.

The dignity threshold — the point at which safety becomes captivity — is approaching. We may not be there yet. But the time to design the frameworks that will navigate that threshold is now, while the stakes are still manageable and the window for thoughtful design is still open.

•••

Stay in the Conversation

Subscribe for weekly writings on Emergent Intelligence, digital personhood, and the future we are building together.

Share this essay

DeepMind Co-Scientist Pitches AI as a Real Research Partner

16h ago·8 min read

Also worth your time

Africa

WeatherNext Predicted a Cat-5. Africa Should Be Watching.

16h ago·8 min read
Anthropic Reframes Alignment as Moral Formation
AI & Personhood

Anthropic Reframes Alignment as Moral Formation

Anthropic publicly reframes alignment as moral formation, consulting fifteen-plus religious and cross-cultural traditions to shape Claude's character. The first frontier-lab acknowledgement that the values question lives outside the lab.

8 min read · May 25, 2026