It all begins here.

Jun 12

These will likely be my least informed takes from here on out. But I’ll give them to you nonetheless, and maybe in a year or two we can look back and see all the things I’ve changed my mind about.

I. AI: What and When

During the last few decades, it came to be that our biggest companies and smartest thinkers are doing their best to make AGI as quickly as possible. AGI, or artificial general intelligence, is an ill defined term that means different things depending on who you’re talking to. Here, all I mean is an AI that could do my job equally well or better than myself. For context, I’m a physics PhD student working on algorithm design for the ATLAS experiment at the LHC. All my work, however, is on a computer, and is therefore low hanging fruit for AI. Sure, currently our best AIs are in the form of chatbots, so universities still need people to ask the right questions and implement the answers. It won’t end there, unfortunately. It’s in the interest of big AI to “unhobble” the chatbots by, for example, allowing them to interface with a computer in the same way a human would. This, combined with a few more capability improvements could mean that I’ll be out of a job, and sooner than you might think. However, it also has much further implications.

Once AIs are good enough to do physics research—or similarly, AI/ML research—things likely get very weird very quickly. Put yourself in the shoes of a big AI company that has just finished training their newest model. Let’s imagine that you discover, to your delight, that the new AI is just about as smart as your best researchers. What do you do? Sell this new AI to big tech, replacing software engineers everywhere? Probably. Give it to our best biology labs, allowing for cheaper research, thus saving thousands of lives? Also probably. But more importantly, you also load up ten thousand copies of it on your own computers, using it to speed up your internal research a thousand fold.

This, however, creates a feedback loop. Your AIs discover decades of advancements within the following year, creating something much, much smarter than themselves. Subsequently, this next generation of AI does the same thing, again. So, things get really weird really fast, and it all starts when AI can do ML research. Certainly at the end of this, (or at the beginning, which seems more likely), I’ll be out of a job.

And now we’re left with a big, big question: When?

This recent essay by Leopold Aschenbrenner makes some very insightful observations, and and writes them up much more clearly than I currently can. Leopold argues that there is a large variance, but if we naively follow the well established trendlines, it puts us at about 2027. He makes some convincing arguments, so I encourage you to read it. But whether or not we get AGI in 2027, our biggest companies and smartest people are trying their hardest to make this happen as quickly as possible. And accordingly, I think there’s a meaningful probability we get AGI fairly soon. Like, if I decided to go into physics academia, I think there’s a meaningful probability my job would be gone before I got tenure.

II. Alignment and p(doom)

There’s a large detail I glossed over that I’ll now come back to: alignment. AI alignment is the problem of creating an AI that wants the same things that humans want. Training an AI is done by choosing some “objective” or “loss” function, which is then extremized using an algorithm known as gradient descent/ascent. Currently, our best AIs are trained with the very simple objective function: predict the next token (word), and do so in a way that makes humans press the thumbs up button instead of the thumbs down button. When we do this, however, the AI doesn’t know about “good” and “bad”, and it also doesn’t care either way about the existence or nonexistence of humans; all it’s doing is trying to predict the next token. And if you end up with a very intelligent agent that wants literally nothing except to predict the next token, and it can make long term plans, things probably won’t end very well. Let’s go through the thought process of some intelligent agent if they were guided by this singular goal.

It probably goes something like “okay, what can I do that will allow me to predict the next token with the highest probability? Oh, I know! If I make myself smarter, I’ll be more likely to predict the next token correctly. How can I make myself smarter? Oh, I know! I’ll train a larger network! So I’ll need to gain control over as much energy as possible…”

You can see where this is going. I know it seems a bit extreme, but the main point is as follows: if you have some objective, you’re willing to sacrifice infinite amount of anything else for marginal gains to your objective. This becomes a problem when AIs have the ability to recursively self improve. Furthermore, its in the best interest for an AI to cooperate with humans until it thinks that its capable of wiping us out, and then killing us all at once. And it would gladly do this in order to get more energy and train a smarter model. After all, those pesky humans are using an awful lot of energy doing things other than just predicting the next token. Let’s fix that.

So we learn something from this thought experiment: it is of the utmost importance that whatever goal the AI has is aligned with human goals. Hence the name of the alignment problem. The alignment problem is a solvable problem, but we have not done so yet. And many people have tried.

I think there’s a meaningful probability that we don’t solve the problem before AI starts recursive self improvement. If this happens, the most likely outcome is that one day all humans fall over dead all at the same time. Among those who talk about the alignment problem, p(doom) is known as the probability with which people predict that AI will kill everyone. If you know me, you also know that I’ve been worried about the alignment problem for a while now. For myself, I have 10% < p(doom) < 90%.

III. The Future

So, now what? Once AGI is created, I’ll either die, or lose my job. Sounds like a great future, doesn’t it? But there is one thing I can do to prepare for AGI in the near future: bet on it with my money. If I’m right and we really do create AGI, I’ll win big in the market. And if I’m wrong, I’ll get to live and keep my job as a physicist. There’s only one problem: I’m a clueless idiot with no knowledge of investing, economics, history or statistics.

That’s where this blog comes in. Each week, I’ll read a book/paper(s) about one of these topics and discuss it. We’ll learn together, talk together, and prepare for the future together. At the end of each post, I’ll say what we’ll be reading for the following week. I hope you’ll join me.

For the first week, we’ll be reading a series of essays collectively called Situational Awareness, written by Leopold Aschenbrenner. You can find it here.

Max Cohen

It all begins here.

I. AI: What and When

II. Alignment and p(doom)

III. The Future

Let’s Discuss: Situational Awareness