Blog

Max Cohen 6/28/24 Max Cohen 6/28/24

Let’s Discuss: Economics in One Lesson

I discuss Economics in One Lesson, an introductory book about basic economic principles and fallacies.

Economics in One Lesson was written in 1946 by Henry Hazlitt, an economist and journalist who wrote about business, the economy, and markets for many well respected publications. Hazlitt was a staunch supporter of free markets, which can easily be seen in this book. With this said, Economics in One Lesson is a surprisingly simple book, despite the large amount of exposition on many seemingly unrelated topics. Hazlitt begins by explaining a single rule, followed by the application of such rule in different contexts. Finally, the book concludes by describing a set of more general principles that can be learned from this analysis.

The single lesson that Hazlitt suggests is “looking not merely at the immediate but at the longer effects of any act or policy … tracing the consequences of that policy not merely for one group but for all groups.”

On the surface, this sounds not like a rule, but instead an articulation of common sense in regards to economic and political endeavors. His real, unspoken argument, however, is that adherence to this “rule” is not undertaken in current policies and schools of thought. In each chapter, Hazlitt examines a different singular example, first explaining how common thoughts fail to adhere to the rule, followed by the claim (with support) that free markets are a better solution. It is important to note, however, that in no place in the book does Hazlitt argue that free markets are a ”perfect” and “just” solution; rather he argues that free markets are merely the best solution out of often bad options.

I learned a lot from this book and I think there are some good ideas; however, I also have some (seemingly) large disagreements, especially surrounding actual policies. Not much more can be said without looking at some of the examples themselves, so let’s jump in. Like always, I’ll try to steel-man his positions before stating my own.

Chapter II. Broken window

While the example presented in this chapter, the broken-window fallacy, is very oversimplified, Hazlitt argues that the lesson learned will be applicable to hundreds of real world examples. The example goes like this: a criminal breaks the window of a shopkeeper, which for this example we will say costs $100. The shopkeeper must then spend this money to replace his window, thus supporting the glass industry. Indeed, if windows never needed replacement, the glass industry would be much worse off. So, because of this broken window, the glass merchants will be $100 richer, which they can use on further goods, ultimately stimulating additional industries.

Are the actions of the criminal, then, a good thing? Have they stimulated the economy? Of course not, Hazlitt says. It is certainly true that the glass merchant is $100 richer, which he can use to purchase more goods. But it is also true that the shopkeeper is $100 poorer. The same money that was spent by the glass merchant will now not be used by the shopkeeper. And accordingly, the economy is no better off for the broken glass.

This is of course a simple and rather obvious example, but Hazlitt will go on to make very similar arguments about many real world examples.

Chapter IV. Public Works

This chapter applies the broken-window example to government spending. In particular, let’s imagine the government spends money on some project, say, building a bridge. This may seem like a good thing not only because of the bridge itself, but also for the employment of the construction workers and the stimulation of the steel industry and so on. However, Hazlitt explains that, like in the broken-window example, we have forgotten the secondary, hidden effects of this project. To appreciate these effects, we must look at where the funds for the bridge came from in the first place. For this, there are three options:

1. Taxes

(If, instead of direct taxation, the money was instead initially borrowed, this will eventually have to be paid out through taxation).

If the government funded the bridge through taxation, a very common method indeed, then the people are poorer by the exact amount of the cost of the bridge. Each dollar that goes to the construction workers and steel industry is exactly not spent by everyone else. There are unmade cars, unbuild houses, unbought goods, and so on, exactly as a result of the bridge. So, yes, one small sector of the economy is better off, which can easily be seen. But other parts of the economy are worse off, by the same exact amount.

In order to determine if the bridge was a good undertaking, then, there is only one important question to ask: is the bridge worth more to the people than the goods they would have otherwise purchased? The answer to this question, of course, depends on the specific government project in discussion, and the price of such project.

2. Reallocation of Government Funds

Another way to fund the bridge is to defund another government project by the cost of the bridge. This example is very similar to taxes; the money given to the construction workers and steel industry are not given to a different industry. While one industry is again better off, another is worse off. It is then of course necessary to determine which project and which industry is more important.

3. Inflation

The third and least preferable way for the government to fund the bridge is to print new money. This of course, causes inflation, which is in many cases the most insidious form of taxation possible. There’s not much more to be said about this one.

This shows a nice application of the broken-window example in regards to government projects. The lesson we learn is not that government projects are bad. Instead, we learn that government projects are good only if the result is more valuable than the sum of the goods that would have otherwise been created. Additionally, we learn that government projects created “to give people jobs” may not be a good justification; the jobs created in the construction and steel industries are lost by the marginal workers in many other industries (the industries of the goods that would have been otherwise purchased).

With these two examples understood, I’ll try to articulate the general principle that Hazlitt is getting at. (I’m not a trained economist, so it’s possible that my analysis here will have some flaws). In particular, we can see the global conservation of money, or in other words, the principle that money spent in one sector must then not be spent in another. Furthermore, any one person’s loss is another person’s gain.

Of course, if new money is printed, this global conservation is broken. But this additional money would be paid for in inflation. Accordingly, the general principle is not that of conservation of money, but rather global conservation of the value of goods.

Except, if better technology is created, then this conservation is again broken. So we can explain our general principle as follows: the global value of goods remains constant without technological change, but increases with advancements and decreases with regressions.

Chapter X. The Fetish of Full Employment

A nice illustration of this principle is explained in chapter X, where Hazlitt argues that full production is a better metric than full employment. Hazlitt begins the chapter with the observation that much of the economic progress has come about because of the ability to create more production with the same amount of labor, such as people putting their goods on the backs of mules, and the creation of the wheel and wagon. (This ability to create more production with the same amount of labor is exactly what I would call technological progress).

Accordingly, Hazlitt argues that the true goal should be to maximize production. He then claims that, in doing so, full employment will be a result. But it is of the utmost importance, Hazlitt explains, that we do not forget the true goal. That we do not turn our attention solely to that of a full employment.

This brings us to my first major disagreement with Hazlitt. To me, the foundational goal is the happiness of the people, not economic growth for the economies sake. Simply trying to build up our economy can lead to cases where we sacrifice (or forget) the wellbeing of a smaller group in order to help the economy as a whole — an insidious effect indeed. Once we start thinking about policy positions, we’ll have to keep this in mind.

Chapters XVII-XIX. Price Fixing, Rent Control, Minimum Wage

It’s not surprising, given Hazlitt’s strong conviction about free markets, that he goes on to denounce rent control and minimum wage laws. Of course, we must fully understand his position before we can determine if it is good or bad.

We’ll first take a look at price fixing in general, which we can then apply to rent control and minimum wage as well. Hazlitt begins with the assumption that, when the government fixes the price of a good, then are doing so below the market price; for if they fixed it at the market price, Hazlitt explains, then it is equivalent to no fixing at all. When the price of a good is required to be below the market level, this discourages the production of such commodity. (Furthermore, the marginal producers are put out of business). In addition, such price fixing increases the demand of the good, since the price is lower. It is clear, then, that there would be a shortage of the good as a result: exactly the opposite of the goal in price fixing. Hazlitt also explains that, while price fixing can appear to work for a short while, but that demand will eventually surpass the supply of the good, thus causing many problems.

Rent control and minimum wage laws, Hazlitt argues, are just special cases of price fixing, and thus yield the same consequence. Rent control is generally proposed on the basis that housing prices are inelastic—that the quantity demanded does not heavily affect the price. Similarly to the case of general price fixing, Hazlitt argues that the negative effects of rent control become worse the longer the duration of such rent control. For example, the production of new houses are strongly de-incentivized, thus further exacerbating the shortage. Furthermore, the landlords will tend not to repair or remodel apartments (unless the appropriate rent increases are allowed). So we end with not only a shortage of housing, but in addition a downgrade in the quality of housing. Hazlitt then discusses a number of additional measures governments attempt in order to mitigate these effects, each of which he argues do not work in the long run.

In regards to minimum wage laws, Hazlitt first explains that an equivalent way to think of a wage is a price to the employer, and thus minimum wage laws are another form of price fixing. As a consequence of such a law, then, a shortage of the good (employment) would come about. For example, Hazlitt posits that workers whose values are less than the minimum wage would be laid off. Furthermore, marginal producers may be put out of business, again exacerbating the shortage.

One contention I have with these arguments are as follows: when the price of a good is fixed lower than it would otherwise be, what if the money is taken away from the profits of the companies rather than creating a shortage of the good? Of course, the marginal companies will not be able to withstand a reduction in profits and will still therefore be pushed out of the market. But for the non-marginal companies, those with sizable profits, they would certainly be able to lower their profits in accordance with the price fix. If my understanding of Hazlitt is correct, he would respond by saying that, while this is possible, it would also reduce the incentives for making such a good, and this would reduce the amount of the good made, therefore perpetuating the shortage. I agree that this price fixing would reduce the incentives for new companies to open within the industry. But for existing (non-marginal) companies, they are still incentivized to produce the good, even with reduced profits. Reduced profits are still profits, so the non-marginal companies should expand their production to meet the demand.

Accordingly, it seems that we may not get a shortage, but rather a reduction in the number of companies in the industry, a lack of new companies entering the industry, in the increase in wellbeing of the remaining workers (including those who left their old jobs to work in the remaining companies).

There are many more examples discussed in this book, but I feel that the main ideas and main disagreements are nicely illustrated by those discussed here. Ultimately, Economics in One Lesson was a fun read, and I learned a lot about how to think about economic concepts and policies. Of course, with this being my first economics book I still am very uneducated, and I therefore may be missing important considerations. With this said, I found myself agreeing with a lot of the general principles Hazlitt explains, but disagreeing with many of his policy positions. My critiques of Hazlitt’s analysis of price fixing, for example, are indicative of a more general disagreement that I have with his positions. Often, I feel that he is willing to sacrifice the wellbeing of a small group of people (often those already less fortunate) for general economic growth, which I feel can be particularly harmful.

Finally, it is important to note that all of the topics discussed in this book are largely done from a theoretical lens (especially given the age of the book). Accordingly, I would need to see an analysis of our current economy to determine if, in practice, the theory is correct. Unfortunately, my knowledge in this regard is lacking; I would need to do more research before having a stronger conviction about my thoughts.

In the next post, I’ll be discussing The Making of the Atomic Bomb by Richard Rhodes.

Max Cohen 6/18/24 Max Cohen 6/18/24

Let’s Discuss: Situational Awareness

I discuss situational awareness, a series of essays about the next decade for AI

Situational Awareness is a series of essays written by Leopold Aschenbrenner regarding the future of AI within the next decade. Leopold presents a strikingly terrifying future, and gives his input on what we should do to prepare. Behind all of this, however, is a surprising number of descriptive statements, which makes this an especially tough piece. After all, if Leopold's predictions are correct (or close to it), then there are some tough truths to accept.

I'll start by giving my best shot at a summary of each section. I'll likely get stuff wrong, but hopefully I'll get a few things right too. I'll also try to make it clear what I disagree with. Let’s get started.

Part I. Counting the OOMs

(An OOM is an order of magnitude. Increase by one OOM means to multiply by 10x.)

The main idea behind this part is quite simple: by measuring the speed of progress in recent years, we can extrapolate to predict how far we are from AGI. In fact, we can do a bit better: we can also look at reasons to believe that we will continue (or not continue) to follow these trends.

AGI in this section is defined similarly to how I did so last week: an AGI can do ML research. Once this threshold is passed, AI research itself can be automated with millions of AI copies that never sleep. This thus starts a feedback loop that ends with something akin to superintelligence.

Leopold first notes that, within the 4 years from GPT-2 to GPT-4, our models went from "about as smart as a preschooler" to "about as smart as a very bright highschooler". This progress was then broken down into three different categories as follows:

1. Compute. We're using bigger computers to train our models.

Leopold points out that, between GPT-2 and GPT-4, we had a 3.5-4.0 OOM increase in compute, and we have no reason to believe that this will slow down anytime soon. One large limiting factor here is going to be raw energy consumption, but companies are already planning massive clusters. In fact, a later section is about just this.

2. Algorithmic efficiencies. We're writing better algorithms that require less compute for equal capabilities.

Although this is a bit less intuitive to quantify, one can make very good estimates here based performance on benchmarks (tests for the AI), as well as API costs. If we do X amount better on some benchmark with the same amount of FLOPs, we can say we've made algorithmic progress. We can also track API costs; as our algorithms get more efficient, AIs get cheaper to run.

Leopold counts the numbers here, and estimates that we made approximately 2 OOMs of progress from algorithmic gains between GPT-2 to GPT-4. Again, he argues that we have no reason to think this will slow down very much.

3. "Unhobbling gains". Giving the AI better access to tools, whether it be directly within the model itself, or external tools such as a code compiler.

To me, this was the most interesting of the three, as I hadn't thought much about this before (I certainly hadn't sectioned it off in my head as it's own discrete thing). Leopold points out a huge list of unhobbling gains between GPT-2 and GPT-4. Some of the most notable are reinforcement learning from human feedback (having a human press thumbs up or thumbs down on the model output), chain of thought reasoning (forcing the model to "think out loud"), and allowing the model to use tools such as calculators and web browsers. From these, Leopold estimates that, between GPT-2 and GPT-4 (again, calculated from benchmarks), we had something like 2 OOMs of improvement from this, although it's somewhat harder to quantify than the others.

It also seems clear that we're nowhere near the limit on unhobbling gains. Some that we should expect to see in the future include: full access to a computer, larger context length (input length) allowing for fast onboarding to current projects, and spending longer time "thinking" about hard problems vs easy problems. Potential gains from this in the future could be absolutely massive.

Ultimately, Leopold estimates that we are on track to maintain the same speed of progress as between GPT-2 and GPT-4. If he's right, this means that by 2027 we'll have another jump analogous to "preschooler to smart highschooler", but this time it'll be "smart highschooler to ???". Leopold then argues that this resulting AI will likely be able to do ML research, especially since it’ll have read every AI/ML paper that’s ever been written.

I agree with the calculations done in this section, and I wouldn’t be surprised if we end up with AGI before 2030. However, there is one relevant possibility that Leopold doesn't discuss in this section that I think has a meaningful probability of occuring:

As we continue to increase compute, discover algorithmic efficiencies and collect unhobbling gains, what if capability gains receive diminishing returns? At the surface, it may seem like we have no reason to believe this. Our models have been doing better and better on benchmarks with no signs of slowing down. But I posit that there is one key piece of information in this regard: what is the AI actually learning? Let's consider two simplistic options:

1. LLMs just do statistics over lots of training data. They can have no internal model of the world, including representations of complex objects or ideas.

2. LLMs have an internal world model, including representations of complex objects and ideas. They are able to manipulate these representations, have "ideas", and operate within the world model.

Of course, these are very simplified views; likely there is some mixture going on. But again, it will be useful to consider these two limiting cases to examine why me might see diminishing returns of capabilities.

If option 1 is correct, we have no reason to believe that the AI will be able to come up with any fundamentally new ideas. The absolute best that we can expect is the synthesization of current ideas; that is, to combine ideas in ways that have not been done before, often creating something more than the sum of the parts. If this is indeed true, and if furthermore this is a general characteristic of LLMs, then we would start to see diminishing returns of capabilities in the coming years. For eventually we will exhaust all of the ideas (and combination of ideas) that have been proposed by humans regarding ML research. And once of these combinations have been exhausted, the AIs will no longer get better at ML research, and we thus might never start the recursive improvement loop.

If option 2 is correct, we will likely not see diminishing returns. As we increase compute and disover algorithmic efficiencies, the AI's world model will get better, and eventually surpass our own. It will therefore eventually be able to do ML research as good or better than humans.

If, as it is indeed likely, LLMs operate within some combination of these options, then we might start to see diminishing returns. For the increase in capabilities can be broken down into two parts: that attributed to a better world model, and that attributed to better statistics resulting in combinations of existing ideas. As we increase compute and discover algorithmic efficiencies, while the AIs world model will improve, we will also run out of existing ideas or combinations thereof (regarding ML research). Hence, we would see diminishing improvements to AI capabilities in the field of ML research with increasing compute and algorithmic efficiencies.

An important question, then, is if these diminishing returns will start before or after our LLMs can do successful ML research. And if they indeed start before this critical moment, how drastic will the effect be? If I had to at all guess, I would say the following:

It seems to me that a good way to create coherent sentences is to have internal representations of less complex objects and ideas, but not to have a fully complete world model. The AI can thus rely on statistics to operate within the space of more complex ideas, and can interface with its internal representations in order to output coherent responses. If this is correct, then we may run out of combinations of ideas (regarding ML research) before reaching recursive self improvement, and also before the AI's world model is complex enough to have good new ideas. This would then result in a very drastic slowdown in capability gains in the future, pushing back the recursive ML research loop—and thus AGI—many years into the future.

However, it is also possible that new combinations of old ideas will improve the world model of the subsequent AI to a degree such that it will be able to come up with new ideas.

It is worth noting that my intuitions regarding the inner workings of LLMs is much worse than any researcher actually working on the LLM frontier, and thus I give a very large variance to these ideas. Accordingly, I would not be surprised to see either of the two outcomes I suggest (or for the truth to unfold not within simplistic model at all).

Part II. The Intelligence Explosion

The central claim in this piece is something I've discussed previously: Once we get automated ML research, it's not long before we get superintelligence through a simple feedback loop of AI progress.

In short, Leopold does a few sanity checks here to make sure we're not missing anything. For example, a) how many copies of this AI researcher will we be able to run with our limited compute, and b) how hard will it be to get to this level of AI in the first place?

To the a), the answer is a lot: likely in the many millions. To b), you can feel free to read the part I.

Part III a. Racing to the Trillion-Dollar Cluster

The main argument of this piece is the feasibility of the trillion dollar cluster, which would (based on previous estimates) be enough to reach AGI. (In fact, we may not even need a cluster this big depending on algorithmic and unhobbling gains.) The argument is broken down into smaller chunks and taken one at a time. Firstly, Leopold argues that the revenue of major AI companies (and associated industries) would be enough to finance such a cluster. He also gives a number of historical examples of projects undertaken at a similar price (thus providing additional evidence for the feasibility).

Next, raw energy consumption (power) is discussed, which is likely the largest bottleneck to the trillion dollar cluster. Leopold seems to think that once people "wake up to AGI" (i.e. realize that it could happen soon and what it implies), that we'll end up easing up on regulations and will therefore be able to set up enough sources of power. (It is also noted that, if we don't ease up on these regulations, we'll probably end up putting clusters in other countries such as the middle east, which is a horrible idea on the basis of national security).

Finally, chips themselves are discussed. Like all of the other constraints, producing enough chips seems totally feasible as long as companies decide to do so. Personally, it seems reasonable that chip manufacturing companies will make it happen, simply because they're in the perfect position to see the AGI boom coming.

My further thoughts on this section are two-fold. In some sense, as we discussed regarding chips, the feasibility of the cluster is only half of the conversation. An equally important question is whether or not we’ll (the united states) actually try. In order to produce enough power for the cluster, for example, existing regulations will likely need to change. Ultimately, this latter question can be reduced to “when/will the US government and chip companies become AGI-pilled?” Of the US government, Leopold discusses this later.

Part III b. Security for AGI

This piece argues that, given the power of AGI/ASI (all starting from the first automated ML research), it is of the utmost importance that our (the United States) AGI be protected as a national security measure. Such power would allow for insane distopian futures if put into the wrong hands including authoritarian states including the CCP. (For those with less knowledge in this regard, Leopold has a short list of atrocities undertaken by the CCP in section IIId of situational awareness.)

Leopold argues that there are two main threats that we need to protect against: stealing of algorithmic secrets, and direct stealing of model weights. This second one is a much more visceral threat, and is in some sense clearer to protect against (not easier, but more clear how this should be done). The first threat, stealing of algorithmic ideas, is much less clear how to prevent, given that one person defecting (being bribed) or being kidnapped could mean the end of such secrets (and thus the end of the AI lead carried by the united states).

This piece then claims that the security needed to prevent both (or either) of these threats would only be possible with the help of the government. Leopold lists a few of the kinds of security measures we would need, which include airgapped datacenters with physical security equal to that of a military base, better encryption algorithms, all researchers working in SCIFs (see this visualization), and extreme personal security clearances for researchers.

Finally, Leopold argues that we are not on track to make this happen; he seems to predict that sometime in the next few years, China will steal some big algorithmic secret, thus causing a reform in our security measures. He hopes this does not happen too late.

Part III c. Superalignment

I have a lot of critiques of the arguments presented in this section. But like always, I'll try to steel-man his position during my summary.

Superalignment is same old alignment problem, but specifically for a superintelligent AI. If you're unfamiliar with the alignment problem, aligning an AI means to align the AI's goals with human goals such that it doesn't kill us (in other words, it's a control problem). To explain why the superalignment problem is so hard, Leopold juxtaposes it against our current (sufficient but not perfect) alignment techniques. Currently, the main technique we use to align our language models is RLHF (reinforcement learning from human feedback), where we click on the thumbs up or thumbs down button after reading the output of the model. The model is then told to try to make people press thumbs up instead of thumbs down. This however, fails when the model gets sufficiently smart. How, for example, do you press thumbs up or down on a million lines of code? How do you press thumbs up or down when it's using complex thought processes that you cannot understand.

It's hard to align a superintelligent agent because we can no longer evaluate whether the AI is doing good or bad things. (Not until something really bad or really good happens, then we'll know).

Leopold, however, is not a doomer, and he thinks that we'll "muddle through" on this one. His prediction for how things will go seems to be roughly as follows:

1. We'll start by aligning the somewhat-superhuman models (the same models that will start the recursive ML research)

2. We'll use these models to align the smarter ones.

In order for this kind of scheme to work, Leopold present a few areas of research that are particularly important.

a. Scalable oversight. In other words, can you have a less intelligent verifier to evaluate the more intelligent models? It is well known that verification is much easier than generation. Accordingly, how much less intelligent do you have to be to still act as a good verifier?

b. Generalization. By studying how current models generalize as they get smarter, maybe we learn about a superintelligent agent getting a bit smarter. Accordingly, as we improve our models by the next OOM, we can try to be sure that it won't generalize in unwanted ways.

c. Top-down interpretability. For example, can we create some kind of lie detector for an AI? (This is contrasted against “bottom-up interpretability”, which is an attempt to understand all of the thought processes and model weights that create any given output”

d. Adversarial testing / measurements. This one is heavily related to all the others, but the main idea here is to try to encounter every possible failure mode in the lab before we publicly release the model.

If we understand these things (and a few others) sufficiently well, Leopold thinks that we'll muddle through, and that our automated AI researchers will be sufficiently aligned in order to align the next, smarter model.

As mentioned previously, I have a few critiques here. My main criticism is that Leopold masks the difficulty of the alignment problem by present an even harder problem (superalignment). Leopold’s predictions are predicated on the fact that we can successfully align a regular-intelligent model. While I agree that this is in theory a solvable problem (one much easier than superaligment), we’re not on track to solve it in time. Leopold mentioned that there’s a lot of low hanging fruit in alignment research, which I fully agree. But he also agrees that the amount of people actually doing the work is terrifyingly small. So unless something drastic changes in the next few years, (especially with Leopold’s short AGI timelines), it seems unlikely to me that we’ll actually solve it. Ultimately, to me this section reads like Leopold says “we’ll muddle through”, and tries to make it convincing by creating a list of important research areas. Then he goes one by one saying “we’ll muddle through” to each individually. That’s not an argument, even if it sounds like one.

It’s possible that Leopold’s model is something like “once America wakes up to AI and begins The Project (part IV), we’ll put a sufficient amount of funding and compute towards solving the alignment problem.” This is certainly possible, but our biggest companies are not doing so currently; I hope this changes drastically once we’re closer to true AGI.

Part III d. The Free World Must Prevail

If you're on board with everything that's been argued up until now, this one kinda comes for free. The main arguments here are as follows:

1. Whoever first creates AGI will end up with a massive military advantage.

Again, once you get an intelligence explosion, scientific progress would skyrocket on extremely short timelines, thus giving rise to technologies we can only imagine. This also probably includes more powerful weapons of mass destruction, and also defense against current nukes (whether it be disarmament, or protection/absorption).

2. America should lead in this, certainly over other options such as the CCP, North Korea, Russia, or some terrorist group from the middle east.

There's not much else to say about this one.

Part IV. The Project

At the heart of this piece is one central descriptive claim: Once America becomes AGI-pilled, we'll start a government project of similar scale to the making of the atomic bomb. If this indeed happens, Leopold hopes that those in charge do a good job.

It’s important to note that the predictions of this section are predicated on his short AGI timelines. If creating AGI took 70 years instead of 7, the global AI landscape would look much different, including security measures.

Conclusions

In situational awareness, Leopold presents a surprising bleak prediction about the future. His main line of prediction seems to be something like "sometime around 2026, China will steal some important algorithmic secret from the US, and we'll start to take things more seriously. We'll ramp up security, and hopefully the government will get involved. (If not, companies will likely be locked in a tight race, thus causing less time to be spent on alignment, making things enormously more dangerous. ALSO, if the government doesn't get involved, our security measures will not be enough to prevent espionage from the CCP). Hopefully, we'll act fast enough such that China doesn't steel our secrets. (Again, if we're locked in a race with China, we'll spend less time on alignment...). Ultimately, we'll muddle our way through the alignment problem, be the first country to create superintelligence (thanks to The Project), and then we'll all put our hand in our pockets and walk away into the sunset".

The two places I disagree most heavily with Leopold are a) the probability/timeline with which we reach AGI, and b) the probability with which we solve alignment. For both of these disagreements, the root seems to lie in the larger uncertainties I give to potential futures. For example, while I accept the possibility of AGI within the next decade, I also think it is possible that we’ll see a reduction in capability gains with upcoming compute/algorithmic advancements; if this happens rapidly enough, we may not reach AGI anytime soon. Moreover, I seem to think that the probability we solve the alignment problem is smaller than what Leopold thinks. Again, it’s totally possible that we’ll figure it out before the first automated ML research, but I think we’re not quite on track to do so; I would not be surprised by either outcome.

With all of this said, situational awareness is an incredibly important piece. Leopold’s discussion of national security and The Project are especially so, given the small amount of existing dialogue. In this regard, I mostly agree with the arguments presented; it is of the utmost importance that AGI does not fall into the wrong hands.

For next week, we’ll be reading Economics in One Lesson by Henry Hazlitt.

Max Cohen 6/12/24 Max Cohen 6/12/24

It all begins here.

AI, p(doom), and The Future

These will likely be my least informed takes from here on out. But I’ll give them to you nonetheless, and maybe in a year or two we can look back and see all the things I’ve changed my mind about.

I. AI: What and When

During the last few decades, it came to be that our biggest companies and smartest thinkers are doing their best to make AGI as quickly as possible. AGI, or artificial general intelligence, is an ill defined term that means different things depending on who you’re talking to. Here, all I mean is an AI that could do my job equally well or better than myself. For context, I’m a physics PhD student working on algorithm design for the ATLAS experiment at the LHC. All my work, however, is on a computer, and is therefore low hanging fruit for AI. Sure, currently our best AIs are in the form of chatbots, so universities still need people to ask the right questions and implement the answers. It won’t end there, unfortunately. It’s in the interest of big AI to “unhobble” the chatbots by, for example, allowing them to interface with a computer in the same way a human would. This, combined with a few more capability improvements could mean that I’ll be out of a job, and sooner than you might think. However, it also has much further implications.

Once AIs are good enough to do physics research—or similarly, AI/ML research—things likely get very weird very quickly. Put yourself in the shoes of a big AI company that has just finished training their newest model. Let’s imagine that you discover, to your delight, that the new AI is just about as smart as your best researchers. What do you do? Sell this new AI to big tech, replacing software engineers everywhere? Probably. Give it to our best biology labs, allowing for cheaper research, thus saving thousands of lives? Also probably. But more importantly, you also load up ten thousand copies of it on your own computers, using it to speed up your internal research a thousand fold.

This, however, creates a feedback loop. Your AIs discover decades of advancements within the following year, creating something much, much smarter than themselves. Subsequently, this next generation of AI does the same thing, again. So, things get really weird really fast, and it all starts when AI can do ML research. Certainly at the end of this, (or at the beginning, which seems more likely), I’ll be out of a job.

And now we’re left with a big, big question: When?

This recent essay by Leopold Aschenbrenner makes some very insightful observations, and and writes them up much more clearly than I currently can. Leopold argues that there is a large variance, but if we naively follow the well established trendlines, it puts us at about 2027. He makes some convincing arguments, so I encourage you to read it. But whether or not we get AGI in 2027, our biggest companies and smartest people are trying their hardest to make this happen as quickly as possible. And accordingly, I think there’s a meaningful probability we get AGI fairly soon. Like, if I decided to go into physics academia, I think there’s a meaningful probability my job would be gone before I got tenure.

II. Alignment and p(doom)

There’s a large detail I glossed over that I’ll now come back to: alignment. AI alignment is the problem of creating an AI that wants the same things that humans want. Training an AI is done by choosing some “objective” or “loss” function, which is then extremized using an algorithm known as gradient descent/ascent. Currently, our best AIs are trained with the very simple objective function: predict the next token (word), and do so in a way that makes humans press the thumbs up button instead of the thumbs down button. When we do this, however, the AI doesn’t know about “good” and “bad”, and it also doesn’t care either way about the existence or nonexistence of humans; all it’s doing is trying to predict the next token. And if you end up with a very intelligent agent that wants literally nothing except to predict the next token, and it can make long term plans, things probably won’t end very well. Let’s go through the thought process of some intelligent agent if they were guided by this singular goal.

It probably goes something like “okay, what can I do that will allow me to predict the next token with the highest probability? Oh, I know! If I make myself smarter, I’ll be more likely to predict the next token correctly. How can I make myself smarter? Oh, I know! I’ll train a larger network! So I’ll need to gain control over as much energy as possible…”

You can see where this is going. I know it seems a bit extreme, but the main point is as follows: if you have some objective, you’re willing to sacrifice infinite amount of anything else for marginal gains to your objective. This becomes a problem when AIs have the ability to recursively self improve. Furthermore, its in the best interest for an AI to cooperate with humans until it thinks that its capable of wiping us out, and then killing us all at once. And it would gladly do this in order to get more energy and train a smarter model. After all, those pesky humans are using an awful lot of energy doing things other than just predicting the next token. Let’s fix that.

So we learn something from this thought experiment: it is of the utmost importance that whatever goal the AI has is aligned with human goals. Hence the name of the alignment problem. The alignment problem is a solvable problem, but we have not done so yet. And many people have tried.

I think there’s a meaningful probability that we don’t solve the problem before AI starts recursive self improvement. If this happens, the most likely outcome is that one day all humans fall over dead all at the same time. Among those who talk about the alignment problem, p(doom) is known as the probability with which people predict that AI will kill everyone. If you know me, you also know that I’ve been worried about the alignment problem for a while now. For myself, I have 10% < p(doom) < 90%.

III. The Future

So, now what? Once AGI is created, I’ll either die, or lose my job. Sounds like a great future, doesn’t it? But there is one thing I can do to prepare for AGI in the near future: bet on it with my money. If I’m right and we really do create AGI, I’ll win big in the market. And if I’m wrong, I’ll get to live and keep my job as a physicist. There’s only one problem: I’m a clueless idiot with no knowledge of investing, economics, history or statistics.

That’s where this blog comes in. Each week, I’ll read a book/paper(s) about one of these topics and discuss it. We’ll learn together, talk together, and prepare for the future together. At the end of each post, I’ll say what we’ll be reading for the following week. I hope you’ll join me.

For the first week, we’ll be reading a series of essays collectively called Situational Awareness, written by Leopold Aschenbrenner. You can find it here.