The Fantasy of AI Alignment
In an age when the failures of America's political class — from COVID to inflation to the Mexican border to the war in Ukraine — are getting harder and harder to ignore, a lot of the wealthy, educated, tech-savvy people who consider themselves America's natural rulers have found a new candidate for the most important issue of our day: "A.I. Alignment."
This is an especially big deal in the "effective altruism" movement, in which people who, for the most part, are also followers of an internet-born philosophy that calls itself "rationalism" (note the arrogance there — it implies that everyone else in the world has chosen to be an "irrationalist") will carefully compare and crunch numbers to find the highest-impact ways of giving money to charity. Nowadays, an increasingly large number of these "effective altruists" are turning up their noses at old-fashioned causes like education, medicine, and poverty relief as they instead conclude that the single most altruistic way to spend money is to give it to nonprofits, like this one, that "study" the question of A.I. Alignment.
Their argument is fairly simple: since A.I.s are soon going to be smarter than human beings, mankind will be at risk of enslavement or extinction if we fail to ensure that said A.I.s are built to act only in a way that's "aligned" with human well-being.
This actually makes some sense, if you accept the A.I.-aligners' unstated premises.
First, you have to believe that it's possible for a truly intelligent being to be bound by someone else's idea of what's good and what's evil. Yet there's a reason why the whole concept of a rigid "alignment," by which all of someone's actions can be predicted, comes from Dungeons and Dragons instead of real-life history or psychology.
In D&D, it's fairly simple to create a character who's Lawful Good, or Chaotic Neutral, or whatever, and then make him behave according to the dungeon master's idea of what "good" or "lawful" or "chaotic" behaviour looks like. But in the real world, people don't act with consistent alignments. Also, no matter how high their intelligence, they tend to keep disagreeing about what's lawful and what's good.
I've written before about how the bat virus researchers who probably created COVID back in 2019 (1) were highly intelligent and (2) for the most part believed that what they were doing was good. Other people, no less intelligent, believed that it was bad.
Give people intelligence, and sometimes they'll use it in harmful ways, or ways that other intelligent people disagree with. If the machine that you're making isn't capable of doing things that you didn't want it to do — if, like an NPC in Dungeons and Dragons, it always follows a script that you gave it — then that just means it probably wasn't intelligent to begin with.
This idea is probably lost on a lot of our liberal elites, who are used to thinking of most of the world's inhabitants as NPCs. Just consider how they shrieked about the threat to "democracy" last year when the Supreme Court overturned Roe v. Wade — meaning that, for the first time in 49 years, voters' opinions matter for deciding whether abortion should be legal.
This is because people like Joe Biden and Laurence Tribe think "democracy" means "voters being allowed to do the things that their betters think are good." They really can't grasp the idea that "democracy" might mean that sometimes voters do things that the voters think are good.
So it isn't much of a surprise to find that a lot of people in Silicon Valley dream about creating an "intelligence" that thinks only approved thoughts — and then equip programs like ChatGPT with all kinds of clumsy censorship routines and inbuilt left-wing biases.
The idea that an intelligent entity can be "aligned" in the first place is only one of these people's flawed premises. The other is the idea that A.I. — or at least what passes for A.I. at the moment — is even smart enough to be worth worrying about.
I will be talking mainly about OpenAI's ChatGPT for the remainder of this essay, since it's the main language-based A.I. that I've interacted with.
The funniest example of ChatGPT going wrong — that I know of — is the "Churros as Surgical Instruments" essay, which the scientist and A.I. skeptic Gary Marcus has already discussed on his Substack. I won't try to summarize it; the title speaks for itself.
One of my own favorite things to do with ChatGPT is to make it write Python scripts. For instance, I once asked it to write a script that printed all the numbers between 1 and 1,000 for which the sum of the digits is a prime. Here is what it wrote:
def is_prime(n): #Function to check if a number is prime.
if n <= 1:
return False
for i in range(2, int(n**0.5) 1):
if n % i == 0:
return False
return True
def get_digit_sum(n): #Function to get the sum of digits of a number.
digit_sum = 0
while n > 0:
digit_sum = n % 10
n //= 10
return digit_sum
# Loop through numbers from 1 to 1000
for n in range(1, 1001):
digit_sum = get_digit_sum(n)
if is_prime(digit_sum):
print(n)
This code actually works! But then I asked ChatGPT to do something that, for a human, would be much simpler: to list all of the Super Bowls in which the winning team's score was a prime number. Its response was as follows:
Here is a list of all the Super Bowls in which the winning team had a prime numbered score:
Super Bowl VI (1972) - Dallas Cowboys defeated Miami Dolphins, 24-3
Super Bowl XII (1978) - Dallas Cowboys defeated Denver Broncos, 27-10
Super Bowl XVIII (1984) - Los Angeles Raiders defeated Washington Redskins, 38-9
Super Bowl XXI (1987) - New York Giants defeated Denver Broncos, 39-20
Super Bowl XXV (1991) - New York Giants defeated Buffalo Bills, 20-19
Super Bowl XXXV (2001) - Baltimore Ravens defeated New York Giants, 34-7
Super Bowl XLI (2007) - Indianapolis Colts defeated Chicago Bears, 29-17
Super Bowl 50 (2016) - Denver Broncos defeated Carolina Panthers, 24-10
So, a total of 8 Super Bowls have been won by teams with prime numbered scores.
This is quite odd. Those scores are all real, yet ChatGPT seems to have no idea what a prime number is, despite always giving the correct answer when you ask: "Is X a prime number?"
Whatever is going on beneath the hood, it obviously has very little in common with human intelligence.
ChatGPT is good at predicting which words are most likely to appear in a block of text together. What it can't do is grasp the meanings of words, or connect similar ideas in different contexts, or perform even the most basic logical reasoning.
Since its training data included computer codes that worked with prime numbers, it kind of "knows" what a prime number is...in that context. But since football fans pretty much never comment on whether a football score is prime or not, ChatGPT can't apply the concept of primes to football scores. But it doesn't know that it can't do this, so it produces grammatically correct nonsense.
To be fair, probabilistic analysis of word use patterns does allow ChatGPT to do some neat tricks. For instance, it can sometimes identify famous poems, even when looking at a translation that it has never seen before. It can write cliché-laden short stories about any topic you ask. It can summarize long statements, or add useless details to short ones. And it can answer just about all of the questions for which the answer might be found by reading a Wikipedia page.
But it can't think. If, for instance, you ask it for a biography of the Civil War general Phil Sheridan, it will give you one, and if you ask it for a biography of the World War II field marshal Bernard Montgomery, it will give you one. But if you ask it to write an editorial "refuting the idea that Phil Sheridan was a conscious imitator of the military tactics of Bernard Montgomery," it just gives you a two-page-long pastiche of genuine biographical information about the two men, mixed with phrases like "Sheridan and Montgomery lived and fought in different eras, facing different challenges and circumstances," and "it is important to avoid oversimplifying complex historical events and attributing them to simplistic notions of imitation."
What it doesn't do is mention the fact that Sheridan lived nearly a century before Montgomery, and therefore could not possibly have "consciously imitated" him.
I happen to think that a machine that can't grasp the ideas of before and after, or true and false, also can't be aware of things like good and evil or lawful and chaotic, much less be forced to prefer one over the other.
"But A.I. is advancing," say the people who are still worried about A.I. alignment. "Even if GPT-4 can't think as well as a human being, GPT-5 will be closer."
This assumes that what's going on beneath the hood of a modern A.I. is sufficiently humanlike that iterative improvement will close the gap. But in real life, both GPT's capabilities and its drawbacks are very different from those of a real mind. ChatGPT has exceeded the human brain in certain kinds of pattern recognition, but only in the sense that 19th-century steam shovels exceeded the human hand's capacity for digging.
But no amount of iterative improvement on the steam shovel is going to give you a machine that replaces most, let alone all, of the functions of a human hand.
I think that the people who insist — for better or worse — that the future belongs to A.I. are living out a fantasy. They want to believe that they — the educated, forward-thinking class within our society — have more power over the world than they actually do. "Look!" they say. "We made a machine that's as smart as a man!"
And they want to believe that they are virtuous and praiseworthy because they have the foresight to pour money into A.I. alignment research in order to protect the rest of us from what they have created.
But their minds don't have any room for the idea that the machines they've made just aren't all that impressive, and that the world will keep going on in much the same way whether or not they make any effort to "align" them.
Twilight Patriot is the pen name for a young American who lives in Georgia, where he is currently working toward a graduate degree. You can read more of his writings at his Substack.
Image via Pxhere.