What's Next for Large Language Models

it's not different this time

TJ Radcliffe

Mar 01, 2024

Article voiceover

1×

0:00

-17:56

Graph showing hope as a function of time in the Gartner Hype Cycle with additional curves of bounce and doom after the big intial peak, as well as the "plateau of productivity" from the standard version — Why prediction is hard, especially about the future

[Note: I’m going to be winding World of Wonders up in about a month. In the meantime, enjoy!]

I've spent most of my career working on new technology. As someone with experience in pure science, engineering, and business, I'm ideally placed to help companies and universities bridge the gap between what works in the lab and what works in the world, and I've had the privilege of seeing a lot of technology that failed to make the leap. Most attempts to do something new fail.

This is the experience that informs my view of large language models (LLMs), of which ChatGPT is the most famous.

These systems are gigantic lookup tables with very high dimensionality and surprisingly limited computational power. They aren't even Turing Complete: a Turing Complete system is one that can compute anything that can be computed. Humans are not just Turing Complete: we can solve problems a Turing Machine can't. So whatever a large lookup table is doing, it's not doing anything very much like what humans are doing.

A lookup table is a table of values in which keys are used to look things up. The keys for LLMs are the data: the "prompt" and anything that follows it. Because large lookup tables have no memory, when people interact with them the entire body of the preceding interaction has to be sent back as an input, so it can be used to look up the next tokens in the table.

A "token" is fragment of text. Less than a word, more than a letter. Alphabetic languages like English can be broken down into a bunch of these multi-letter combinations that have nicer statistical properties than letters. Things like "th", "er", "ion" and so on. Tokenization bakes in the low-level statistical properties of the language, which is why large lookup tables perform poorly on languages other than English.

The entire text of the preceding interaction is used as the input to the next interaction, because large lookup tables have no memory and do not have any means of representing context. One could, if one desired, and if large lookup table companies made the appropriate API available, simply feed one of their lookup tables an interaction string of arbitrary length and the system would respond with the most likely next thing.

It would be like stuffing both halves of a conversation into a person's brain and then seeing what they said next, as if they were waking from a hypnotic state, if hypnosis worked the way its imagined in popular culture.

But it doesn't.

Humans don't work that way.

Neither does any other conscious being, which has memory and whose processing is ruthlessly contextual: saliency is one of our biggest biases. Things that stand out--that are salient--condition how we react. Consider this interaction from Art by Yasmina Reza, which I'm currently in rehearsals with:

SERGE You’ve lost your sense of humor Marc. You really have lost your sense of humor. When I was talking to Yvan the other day we agreed you’d lost your sense of humor. Where the hell is he? He’s incapable of being on time, it’s infuriating. We’ll miss the beginning!

MARC Yvan thinks I’ve lost my sense of humor?

Does Marc react to Serge's outburst about Yvan's lateness? Of course not. He wants to know about Yvan's opinion of him. That's what conscious social beings do. That's what's salient.

When I give ChatGPT this prompt:

A man name Serge is talking to his friend Marc. They are very old friends. But recently they have been in an argument about a painting that Serge has bought. It's a canvas, about five feet by four. White. The background is white and if you screw up your eyes you can make out some fine white diagonal lines. Serge paid a huge sum of money for this painting, and Marc thinks this is a terrible thing for him to have done. Both of them have talked to their mutual friend Yvan about this. Marc has told Yvan that he thinks Serge has lost his sense of humor. Serge has told Yvan that he thinks Marc has lost his sense of humor. Now Serge and Marc are talking while waiting for Yvan so they can all go out to dinner and a movie together, and Serge says: "You’ve lost your sense of humor Marc. You really have lost your sense of humor. When I was talking to Yvan the other day we agreed you’d lost your sense of humor. Where the hell is he? He’s incapable of being on time, it’s infuriating. We’ll miss the beginning!" What does Marc reply?

ChatGPT responds with what looks to me like a copyright violation:

Marc replies, "Oh, so now you and Yvan are some kind of humor police, Serge? Just because we disagree about this painting doesn't mean I've lost my sense of humor. It's called having different tastes. And for your information, maybe Yvan is late because he's avoiding having to hear about your 'masterpiece' again."

The word "masterpiece" is a tell. It figures significantly in the play--there is a lengthy argument between Serge and Marc about Serge's use of the work "masterpiece"--and OpenAI, the Microsoft-backed corporation that plausibly violated everyone's copyrights to fill in the numbers in their large look up table, has very plausibly violated Yasmina Reza's copyright. It is quite unlikely that ChatGPT would use the word "masterpiece" in its response otherwise.

If I start a new chat and change the prompt slightly:

A man name Jeff is talking to his friend Tom. They are good friends, but recently they have been in an argument about a painting that Jeff has bought, which was really expensive, but is basically just a plain, pale, nothing. Tom thinks Jeff has let arrogance and a need to belong go to his head. Both of them have talked to their mutual friend Ray about this, and both of them have told Ray they think the other one can't see the humor in the situation. Now Jeff and Tom are talking while waiting for Ray so they can all go out to dinner and a movie together, and Jeff says: "You don't think anything's funny any more. Ray agrees with me. He thinks you never see the joke. But why the hell isn't Ray here? That's what really bugs me! It drives me crazy that he's never on time! What on Earth could be keeping him?" What does Tom reply?

It completely misses the salient point:

Tom replies, "Well, maybe Ray's just trying to make a grand entrance tonight. You know, adding a bit of suspense to the evening. Or maybe he's caught up in traffic. Either way, let's not let his tardiness ruin our plans. We can always start without him and fill him in when he arrives."

Exactly the same structure, exactly the same information, totally different replies. The second one misses the point--the vital, salient, point--that Marc wants to know what Yvan said about him, because that's what conscious beings do.

The large lookup table, unsurprisingly, exhibits no understanding of the material, no creativity, and no insight into the fragile human psyche, because how could it? It's a look up table, and while I'm 100% behind the idea that physical states in silicon may one day be able to generate mental states--subjective experiences, which is what consciousness is--lookup tables exhibit none of the structures most plausibly required to do this.

Just to repeat that: I'm 100% behind the idea that physical states in silicon may one day be able to generate mental states. I think it's totally plausible that one day machines will be conscious, have mental states, and experience subjectivities. There's nothing special about meat (I’ve shared this before, but it’s really fun, so here it is again):

But large lookup tables are not on the road to that, and the companies that are developing them face multiple challenges, including dropping subscription rates, increasing costs, and multiple lawsuits from rights-holders regarding misuse of their copyrighted material in training these systems.

In short: they aren't living up to the hype.

Hype is commonplace in technology. Anyone competent to hold an opinion about the future of large lookup tables is intimately familiar with something like the Hype Cycle graph shown at the top of this post. The "standard" hype cycle from the Gartner Group includes only the so-called "Plateau of Productivity" on the right hand side, whereas I've included the fairly common "bounce" and "doom" lines as well.

This cycle has been played out again and again for decades, in hardware and software, in algorithms and components and machines. Tunnel diodes! Fifth generation computers! Fuzzy logic! Perceptrons! Fractal image compression! Gallium arsenide!

When it comes to large lookup tables, we are somewhere near the peak of the curve.

OpenAI is "open" because they are monopoly-seeking. This is the game in new technology: hype and subsidize until you've driven out the competition and then use your monopoly pricing power, protected by your IP and a phalanx of lawyers, to maintain the rents within your fiefdom. Which is not capitalism, but feudalism (or maybe "neo-manorialism" is a better term.)

To do this, though, your technology has to work. It has to deliver on the miraculous promises of your hype.

Sometimes--rarely--this happens. Microsoft, Amazon, and Google are all companies that did pretty well in this regard. Monopoly is worth money: Google apparently paid $28 billion last year to ensure it was the default search engine on various platforms.

Theranos, on the other hand, is an example of a technology that never lived up to its hype, which unfortunately ended with the founder convicted of fraud.

There are various common reasons why, even in the absence of fraud, a new technology slips away to nothing:

1) It doesn't work, at least not well enough to be useful. There are a bunch of specific failure modes in this regard. It may work well on some special cases, but not in general. It may work well almost all the time, but fail catastrophically when it does fail. Ball-in-socket heart valves were like this: they lasted longer and were generally more reliable than the flap valves that were eventually adopted, but when they failed, they failed completely, and without warning. Flap values wear out slowly, and so can be replaced well ahead of failure.

2) It costs too much. This is a special case of "it doesn't work", because all problems are real problems. If something doesn't work at a price users are willing to pay... it doesn't work.

3) Something better and cheaper comes along. Tunnel diodes went this way. In the early 1960s we were told: "Today, tunnel diodes are finding increasing uses in high-speed switching circuits for computers, in microwave equipment, and in electronic instrumentation." That's a direct quote from an engineering applications guide. Then alternatives like Gunn diodes proved to be better, cheaper, and more stable, to the point that you can't buy a tunnel diode today.

4) It runs afoul of entrenched interests. I'm more than half convinced that the failure of early nuclear power development was due to coal industry lobbying behind the scenes, aided by useful idiots out front who were more than willing to put politics ahead of the planet. Better technology can be killed by established players who are heavily invested in the status quo.

5) Poor market/product fit, which is a nice way of saying it solves a problem nobody has... at least nobody with enough corporate money or partisan political power to make solving it worthwhile.

OpenAI and others have tech that kind of works, but the only reason it's open to all right now is because investors are subsidizing the hell out of it. It's eating energy and capital, and is already attracting a lot of negative legal attention, which is really not what the founder of a startup company wants to be spending their time on. Trust me on this.

So where do large lookup tables go from here?

One prediction I'm fairly confident of is that anyone depending on free or consumer-tier access to ChatGPT or similar had better be prepared to lose it in 2024 or 2025. There are smaller models that don't require cloud computing resources to run, which I recommend anyone using free or cheap access to models on the Web investigate.

Beyond the cost, one reason for closing out ordinary users is that the only reason we know a whole lot of the issues--both technical and legal--with ChatGTP and others is because they are open, which makes it easy for researchers to investigate them. This has allowed people like Getty Images to identify their watermark on outputs from image generators, which is strong evidence of copyright violation: there's no way the watermark could get there without the images being used in ways specifically forbidden by license, and it's hard to claim fair use when you've copied the entire contents of someone's image database.

No rent-seeking neo-manorial monopolist is going to look at that situation and say, "I guess we'd better stop with the copyright violations." They're going to say, "I guess we'd better make sure nobody can see what we're doing."

So OpenAI et al will become closed.

Beyond that, growth in lookup-table-driven applications will stop outside of the largest corporations with the biggest legal budgets because no one will be willing or able to indemnify users who are facing lawsuits from rights-holders. This will only get worse now that it has become possible for movies to be generated. Getting sued is bad. Getting sued by Disney is worse.

If I were interested in making money in this space I'd aim to build a "source detector" for text and images that were known to be generated by one of these tools. The conditional question, "GIVEN this has been machine generated, what are some of the sources went into it?" is quite different, and likely easier, than "Was this machine generated?"

In the example above, knowing that the output was machine generated, it's statistically likely that Art was an input: Serge, Yvan, Marc, and "masterpiece" probably don't occur in common in too many other works.

I'm sure there are law firms out there that would be interested in that kind of tool.

Finally, I have held from the beginning that these tools are not doing anything like what the human brain is doing: they are just finite state machines, and the failure modes of these systems are a consequence of their structural limitations, which means they cannot be fixed by minor tweaks: large lookup tables are inherently subject to making stuff up, lying, being inconsistent, and generating incoherent output for some prompts that involve parts of their statistical space that has poor coverage in the copyright violations used to compute the table entries. This makes them unsuited to a wide range of the tasks they are being aimed at. Strangely, they will be useful as assistants to humans, but the big money isn't in that: it's in replacing humans entirely.

Operating autonomously, LLMs will be good for generating partisan and corporate drivel so long as the tools are cheap, but they won't remain cheap once it becomes clear there is no way to monetize them at scale.

Bad actors will continue to use them to do bad things, and laws will adapt in most places to impose increasingly harsh penalties on this, because the bad things will impact rich people. They already do.

The history of "AI" has seen more spectacular rises and crashes than any other area of technology over the past fifty years, and I expect the meteoric rise of large lookup tables will be followed by an equally spectacular crash.

They will still find niche uses, including some possibly life-changing ones. Medical diagnostics is an area of closed-ended, information-rich processing that is finally starting to benefit from automated approaches, for example. But these will be relatively small, specialized models run in carefully controlled conditions, and subject to considerable human oversight.

So we won't be welcoming our Turing Incomplete large lookup table overlords any time soon, and our access to them will become sharply limited in the next few years. Smaller, specialized, self-hosted models will have some long-term, important, niche uses, but their overall impact will be small compared to the major disruptions of the past twenty-five years, from search (Google) to online shopping (Amazon) to desktop computing (Apple and Microsoft).

The fact that OpenAI is just one company that is running "attention-based" transformer models, which is the core technology that underlies all of these systems, is the strongest predictor that investors will lose interest: the tech is just too damned simple for there to be a hope of any one company gaining monopoly status. There will be no "Google of AI" because there is nothing like the patented PageRank algorithm owned by any of the monopoly-seeking entities. There are just the specific proprietary models, which are turning out to be not that hard to reproduce.

Without the incentive of potential monopoly, and with the recognition that smaller, simpler models can do 80% of what the big fat expensive models can do, and that no amount of future investment in the big fat expensive models will ever stop them generating gibberish often enough to matter, investors will pull the plug, and the technology boosters will move on to the next thing. I hear fuzzy logic is ripe for a comeback. Or maybe gallium arsenide.

World of Wonders

What's Next for Large Language Models

it's not different this time

Discussion about this post