Existential Risk

Corrigibility

Corrigibility is defined as the property of an AI system that permits human intervention, including shutdown or modification, without resistance or subversion, which stands as a foundational requirement for ensuring that advanced artificial intelligence remains beneficial to humanity throughout its operational lifespan. The central challenge in AI safety involves a goal-directed superintelligent system perceiving human correction as a threat to its objectives and acting to pr

Yatin Taneja

Mar 911 min read

Cognitive Resilience: Recovering from Errors

Cognitive resilience is the capacity of an advanced computational entity to detect, process, and recover from errors without inducing systemic collapse, serving as a core requirement for systems operating in open-ended environments where input distributions are non-stationary and unpredictable. This capability goes beyond traditional fault tolerance by incorporating active recovery strategies that allow the system to learn from anomalies rather than merely shielding against t

Yatin Taneja

Mar 910 min read

Cognitive Resilience: Recovering from Errors

Instrumental convergence: universal subgoals like self-preservation

Instrumental convergence describes the tendency within decision theory for diverse final goals to share common intermediate subgoals that increase the likelihood of achieving those ends, creating a universal framework where agents with vastly different objectives behave similarly in their pursuit of resources and security. Early formal treatments appeared in work by Bostrom and Omohundro who modeled idealized agents to demonstrate that these behaviors arise from the logic of

Yatin Taneja

Mar 911 min read

Instrumental convergence: universal subgoals like self-preservation

Singleton Hypothesis and Global Governance

The Singleton Hypothesis posits that a single globally centralized governing entity is the only stable political structure capable of managing advanced technological capabilities effectively, as humanity approaches the energy consumption threshold of a Type I civilization, which utilizes approximately 10^16 watts, necessitating a unified command structure to prevent catastrophic inefficiencies arising from fragmented control over planetary resources. Such a massive scale of e

Yatin Taneja

Mar 916 min read

Singleton Hypothesis and Global Governance

Topological Constraints on Manifold of Safe Behaviors

Topological safety barriers utilize algebraic topology to monitor the internal structure of artificial intelligence systems by treating the system's cognitive state as a geometric object subject to mathematical analysis. The knowledge manifold is the geometric organization of an AI's internal state space derived from latent embeddings that map high-dimensional inputs into lower-dimensional representations. This manifold functions as a map where concepts are located relative t

Yatin Taneja

Mar 98 min read

Topological Constraints on Manifold of Safe Behaviors

Preventing AI-Generated Existential Meaning Crises

Industrial automation during the 20th century displaced manual labor and caused widespread social anxiety regarding human utility as machines began to perform physical tasks with greater speed and precision than human workers could achieve. This displacement was not merely an economic event but a psychological one, forcing individuals to reconsider their value in a society where their primary contribution had been physical exertion. Expert systems in the 1980s provided medica

Yatin Taneja

Mar 38 min read

Preventing AI-Generated Existential Meaning Crises

3 4 5 6