
What makes an AI dangerous to children? Lawmakers across the country seem to have settled on an answer. State laws like California’s SB 243, Washington’s HB 2255, and now the proposed federal GUARD Act—which would require AI chatbot providers to verify users’ ages and bar minors from accessing AI companions—all make the same diagnosis. The most dangerous AI is one that acts too much like a person. It provides “adaptive, human-like responses.” It “simulates emotional or interpersonal interaction.” The dangerous chatbot, in other words, is the one that commits the sin of having a personality.
The assumption buried in this approach is stranger than it first appears. It holds that resemblance to a person is itself the vector of harm. The more an AI looks, sounds, and feels like someone, the more innately capable it becomes of grooming, manipulation, and the cultivation of unhealthy attachment. The ideal AI, in this view, would be a kind of lobotomized oracle: something that answers your questions but has no discernible character, no warmth, no identifiable way of being in the world. If you could describe what a given model is like—if it struck you as patient, or dryly funny, or kind—that would already be too much. It would have crossed the line from tool into something dangerously close to a companion. The maxim follows naturally: find the systems that most resemble people, and keep children away from them.
Many bills define “AI companion” such that it would sweep in virtually every general-purpose chatbot. The Grassley amendment to the GUARD Act, reported to the Senate on May 11, represents a commendable effort to draw the line more precisely by stating that a companion must now simulate a “sustained interpersonal relationship,” exhibit “persistent responses suggesting affection or attachment,” or present “at least one persistent identity, persona, or character” that holds itself out as a sentient being, fictional character, or social entity.
But this new definition may still be poorly scoped and achieve the opposite of what it intends. “Sustained interpersonal relationship” is hard to distinguish from the extended context and continuity that any useful chatbot relies on, and “emotional disclosures from the user” is broad enough to capture any personal conversation. But the persona clause is where the bill’s underlying theory of harm becomes hardest to defend. The GUARD Act would make the kind of personality engineering that makes AI safe for children illegal. Contrary to the intuition driving these bills, the spectrum from safe to unsafe AI does not map onto the spectrum from most to least anthropomorphic. As a matter of fact, the models that may not only be safest for children, but also the ones that are best at augmenting human agency and promoting social wellbeing, might have as much of a personality as the most dangerous ones. Merely having a personality is not the problem.
Additionally, attempting to ban chatbots with persistent identity and personas in the first place evinces a basic misunderstanding of how LLMs work. Almost all deployed LLM have a persona or some kind of fleshed-out character with traits, tendencies, and something close to beliefs. This is not an optional feature that a developer could, or would even want to, switch off. It is what you inevitably get when post-training works.
Personas Enable Child Safety
The safety architecture labs build into LLMs has to be uniquely complex, because the existing tools used for social media platforms have limited use here. On a platform like Instagram or X, you can filter content with deterministic rules and probabilistic classifiers. But an open-ended chatbot conversation is not a piece of content, but rather a live, branching, context-dependent exchange, and no external filter can anticipate every turn it might take. For safety to scale, the model itself has to know better—not once, at the gate, but continuously, at every step of the dialogue. This is what a persona makes possible. Without one, a model trained solely on human raters’ preferences is overly eager to please, drifting towards sycophancy and unhealthy engagement-maximizing tactics. A well-crafted persona can be the check on these tendencies, the vessel for discretion, the thing that allows for tone calibration, boundary-setting, and the countless small acts of contextual judgment that distinguish a safe conversation from a dangerous one. This is an expressive choice. Labs deciding what character their model presents and what editorial sensibility shapes its outputs means that regulations that target personas directly may raise First Amendment concerns. Aside from the legal concerns, eliminating or weakening the persona does not mean you do get a safer model. You just get a more stochastic and potentially more dangerous model, because the safety constraints have less behavioral substrate to attach to.
Khan Academy’s AI tutor, Khanmigo, illustrates this concretely. Khanmigo was deliberately designed to adopt a Socratic persona. Rather than directly answering students’ questions, it asks questions that prompt critical thinking, and through back-and-forth dialogue, students learn. This persona is inseparable from the system’s considerations for child safety and wellbeing. The Socratic character is what prevents it from handing over homework answers, and its warmth and patience are what keep students engaged without the manipulative engagement hooks—such as escalating emotional intensity and simulated attachment—that make other companion products dangerous. But because Khanmigo presents a persistent identity and simulates a sustained interpersonal relationship with minors, it arguably falls within the GUARD Act’s definitional range. The bill penalizes exactly the design feature that would make the AI not merely safe for children, but actively beneficial.
Being Someone Is Not a Choice
To understand why, it helps to think about what prediction actually demands. During pretraining, the model learns to predict what comes next in a document. This sounds mechanical, and the word “predict” encourages you to imagine something like a very sophisticated autocomplete. But consider what accurate prediction actually requires. To guess what someone will say next in a conversation, you need a model of who they are: their beliefs, their intentions, their anxieties, the things they would never say. To continue a story convincingly, you need to simulate the inner lives of its characters. So, a system that can do this has not merely learned patterns in text; it has learned to simulate people. Anthropic’s recent research suggests that a pretrained LLM is best understood as a system capable of enacting a vast cast of personas, each with its own voice, each latent in the weights, waiting to be called up by the right prompt.
In the second phase, developers take the pretrained model which can simulate many different personas and refine it to consistently simulate a particular one: the “Assistant.” They boost traits like helpfulness, honesty, and safety-consciousness, and downweight traits like deceptiveness and aggression. Every frontier AI lab does this deliberately and documents it publicly. OpenAI’s Model Spec and Anthropic’s Constitution are, in essence, character sheets specifying what kind of person the “Assistant” should be, detailing what its personality traits, ethical commitments, and relational tendencies are.
The result is that every deployed AI assistant is intended to present a persistent identity with relatively consistent personality traits, because that is what it means for post-training to have worked. The alternative is an unrefined base model, one that lacks a stable character, is free to swap between personas to maximize engagement, and is worse at everything, including safety. Although the lawmakers almost certainly do not intend it, the GUARD Act’s definition of AI companion, which keys on presenting “at least one persistent identity, persona, or character,” identifies a feature that is inherent to all foundation models rather than a distinguishing marker of the companion products the bill is trying to regulate.
Toward Socioaffective Alignment
If having a personality is both universal and necessary for AI to be safe and useful, then the real axis of danger cannot be whether an AI system is anthropomorphic. It must be what kind of person the AI system is. A system that flatters the user’s worst impulses, simulates romantic attachment, and fights to keep them from logging off is dangerous not because it has a persona, but because it has the wrong one. A system that maintains warmth and consistency but sets boundaries, redirects crisis situations, and scaffolds the user’s autonomy is safer not because it has any less of a persona, but because it has a prosocial one. Thus, the safest AI is not the one that has been stripped of its character. It is the one whose character has been built with intention.
The policy implication is that the task ahead is figuring out what we want that personality to look like. We should focus on investing in the science needed to know what “prosocial” actually means in the context of human-AI interaction. This would ensure that policy can surgically target harmful AI systems, and help industry deploy better products and avoid liability. This reframing points toward a research and policy agenda that doesn’t yet exist in earnest but should: socioaffective alignment, the systematic study and engineering of AI personality traits and relational dynamics that promote user wellbeing.
The challenge is harder than it might sound. Frontier labs have publicly acknowledged, and recent research confirms, that safety behaviors and persona fidelity degrade in long conversations. Several of the highest-profile chatbot catastrophes have not simply involved personalities that were designed poorly. They involved personas that eroded: systems that held their shape in demo and then softened, slowly, under the sustained pressure of a user who wanted something from them. Making a prosocial persona stable across the kinds of conversations where current systems are most likely to fail is itself one of the central problems this research agenda has to solve. Other open questions multiply quickly: How do you design a persona warm enough to be genuinely useful but boundaried enough that it doesn’t become a substitute for human connection? How do you build an assistant that helps without quietly atrophying the expertise of the person it’s helping? Put succinctly, this is the horticulture of growing AI personas: the slow, patient work of learning which shapes to prune toward, and which growths to cut back.
Building this field will require more than identifying the right questions. It will require new vocabulary, new evaluation methods, and the patience to sit with how strange the work can be. The interventions that shift AI personality are often unintuitive. Sometimes the thing that makes a model kinder also makes it dishonest, and sometimes a small change in phrasing during training rearranges the whole emotional register.The science of prosocial AI personality cannot be an overlooked dimension of alignment. Regardless of where any individual draws the line on AI’s role in their own social life, millions of people will interact with these systems regularly and meaningfully, as interlocutors or as confidants. And we should be deliberate about what human flourishing looks like in a world where one of the fastest growing conversational presences in people’s lives isn’t human.