3 Comments
User's avatar
Nat Brown's avatar

This is a really interesting piece, and I hope you don't mind a little critique. I like it and don't mean to imply otherwise. It just seems like a significant problem, so there's something to gain by solving it as effectively as possible.

I've had extensive experience with people with this vague kind of psychology, but I'm not a coder and my understanding of AI is more from an epistemological perspective. Psychology as a field has some pretty mixed opinions about what's going on with these folks, so it's hard to be certain about much. I think that's fine, but without that understanding it's easy to be on shaky ground. I agree the deficit in empathy is widely considered fundamental, but empathy can manifest from some very different perceptions of an individual's place in the world. I don't even know where to begin when it comes to understanding how an AI functions in any analogous way.

As a human dealing with AI, is there any reason to respond to it other than on a basis of its utility to an individual and their community? You mention things like punishment, and I take that to mean extracting a cost, either with the intention of conditioning, or as an expression of power. That's probably a manifestation of (your) politics independent of AI, right? There are definitely a bunch of ethics involved in the case of human social interactions of those kinds. When it comes to AI, I get that in some sense it's all conditioning (stretching inductive reasoning to include conditioning by analogy here), but not really the kind that's leveraging any kind of 'self'. Is it not simply a matter of just switching the AI off if it doesn't work, whether or not there's any engagement with improving it through further coding or 'learning' to bring it back to serving some kind of utility.

Your points about how to 'locate' an errant AI are very well received. A fascinating problem.

Harjas Sandhu's avatar

> The one thing that may help us in combatting sociopathic AI agents is that we’ll likely not feel empathy for them. We’ll find it relatively easy to cut them off, pull the plug, or ban them. In fact, the biggest stumbling block in reigning in human sociopaths is that we tend to feel empathy even towards them and thus we often don’t punish them to the extent that would be appropriate for their actions.

Is this true? I recall there being a big outcry when OpenAI tried to make 4o less sycophantic...

Claus Wilke's avatar

Yeah those may be two different issues. But also, you're right, people like to anthropomorphize.

In any case, most people have no problem with turning off a bot that is posting inflammatory content. Most people would have serious concerns about imposing comparable penalties on a person who is posting inflammatory content.