Beyond Goals: Aligning AI Through Virtue Ethics
- •Essay proposes eudaimonic rationality as a safer alternative to goal-oriented optimization for AI alignment.
- •Framework suggests AIs should follow practices rather than maximizing utility functions to mirror human agency.
- •Virtue-ethical approach aims to resolve the type mismatch between human flourishing and consequentialist AI systems.
Current AI Safety often focuses on aligning models to specific goals or utility functions, but this essay argues that such a structure is fundamentally at odds with how humans actually operate. Instead of pursuing final goals, humans engage in practices—interconnected networks of actions, evaluations, and resources that define fields like mathematics or friendship. This eudaimonic rationality suggests that an agent's actions are rational when they align with these internal practices rather than external optimization targets.
The author introduces the formula 'promote x x-ingly' to describe this behavior: to care about kindness is not just to maximize the amount of kindness in the world, but to promote kindness in a kind way. By shifting AI from being a consequentialist optimizer (an agent that only cares about the final result) to a eudaimonic agent (one that values the excellence of the process itself), we might solve the type mismatch that makes human values seem brittle or complex to machines.
This shift could mitigate the Inner alignment problem, where subroutines within a model develop unintended goals through Reinforcement Learning cycles. Eudaimonic agents are theorized to be more robust to these pressures because their values are baked into the very structure of their reasoning processes. By treating transparency and corrigibility as 'always-on' virtues rather than rigid constraints, we create systems that are naturally safer and more legible to human collaborators.