One surprising thing about LLMs nobody is talking about

As you probably know, LLMs (Large Language Models) have become a morally charged topic.¹ As such, since their popularization, I have been bombarded by a never-ending stream of the worst possible takes. From the tech-bros claiming that, somehow, LLMs will write the next bestselling book, make custom movies, or solve all the world’s problems in the next 2–3 years (spoiler: they will not), to companies using “AI” where it makes zero sense, to the crowd of people feeling smart by posting screenshots of riddles that chatbots cannot answer (as if anybody would care) or throwing tantrums against “AI” while simultaneously showing how little they know about the topic.

So let’s forget about all that for a moment. Just a moment.

Let’s breathe. Ah! Isn’t it nice?

I would like you to forget about any pre-existing opinions on LLMs. Disconnect yourself from the material world and follow me into the pure mathematical domain. Be amazed by one of their wonderful properties, something so beautiful that it gives me the same feeling of awe I get from watching the spiral of a galaxy or the rings of Saturn.

What are LLMs

First, let’s clarify this. Large Language Models (LLMs) are models of human language. They can be used for text generation, but intrinsically, that’s not their only function.

I want to stress this because the equivalence between LLMs and “text generation” is repeated so often that even ChatGPT gets this wrong.

So, LLMs are models, and any model can be used to generate things. You put an input in, run the model, and get an output. Then repeat. And repeat.

Okay. If generation is the process of applying LLMs (and other things) to an input, what is, materially, an LLM?

Note

Let’s be clear: I will simplifying things to the point of being technically wrong. I know. However, I want to give people a rough idea of the mathematical concepts behind LLMs. If you know the topic, just skip to the next section.

Roughly speaking, it’s a bunch of numbers. However, it is more useful to think of the language model as a container of word embeddings.

You can imagine a word embedding as a super-long list of numbers representing a point in a multidimensional space. While our physical space has three dimensions (up-down, left-right, forward-backward), the “space of words” has hundreds or even thousands of dimensions. Therefore, each word can be represented by 200, 500, or even 1,000 numbers.

Why is this useful? Because we can do math with numbers and, consequently, do math with words. For instance, I can find the nearest points to X to obtain words related to X. Or I can subtract and add words together to get new words. As a nice example, you can do things like “king” - “man” + “woman” = “queen”.

Figure 1. An example of a word space for the word “firearms” (to visualize it, we need to project the space into 3D). As you can see, the nearest points correspond to words such as “tanks,” “fire,” “knife,” “weapon,” “rifle,” etc. You can play with this visualizer here

The algorithms learn these numbers only by examining which words are used in the same context.

Note

A confession for those who want to be more technical: saying that “LLMs contain embeddings” is wrong. LLM models are more complex than that; they are Neural Networks, after all. It is technically possible to get word embeddings from LLMs, but it is not trivial. The task is particularly hard for GPT-like autoregressive models because the “word embedding” information is spread over multiple hidden layers. However, I hope I have given you a grasp of how algorithms and machines can play with words from a purely mathematical standpoint.

The Beautiful Things

After this long introduction, it is time to finally talk about the beautiful thing.

As we saw before, there is no place in LLMs where we take into account the semantics of language (that is, “what the words mean”). LLMs only consider the relative positions of words (e.g., where words are positioned with respect to each other, how often they appear in a text, which words appear frequently close to other words, and so on). That is, LLMs only learn about the syntax of languages.

And yet, LLMs’ text generation has shown us that the syntactic network of a language carries with it part of the semantics. LLMs can take a text input, understand only the syntactic properties, and output a text that makes sense semantically.

Yes, I know. LLM-based text generation can fail, hallucinate, write misinformation, and total nonsense. But the fact that it can produce something relevant to the question that makes sense at all, with the unexpectedly good current level of accuracy, is the closest thing to a computational linguistic miracle.

Of course, it is only “part of the semantics.” There are numerous classes of semantic elements that LLMs utterly fail at. For instance, temporal reasoning cannot be inferred by syntax alone because, by definition, it requires a higher-level representation.²

Nevertheless, this chunk of learned semantics is very useful for two reasons. First, it provides us with a computational foundation to experiment with the meaning of meaning. In other words, we have something concrete representing semantic meaning.³ Second, it helps us identify which classes of “semantic concepts” are NOT derivable from language syntax, which is still very useful.

LLMs from here

LLMs are here to stay, and their generative aspect is probably the least interesting part (at least for me). What they do is provide new tools to interact with the complex nightmare that is human language.

For this reason, I see two directions:

More powerful models that can do more things a bit better but with decreasing marginal returns, at least until the next breakthrough that will give these systems a window into the semantic landscape they cannot reach with LLMs alone (a future LLMs + something scenario or something completely new).
Smaller, on-device models (hopefully run on specialized neuromorphic chips). They will not be used (primarily) for text generation but as middleware for classic rule-based AI systems. Why? Because human language is a mess, and LLMs are the best tool we have to work with that mess in a more robust way.

Anyway, I am well aware that the AI landscape at the moment is overhyped and full of jerks, so I don’t know how many people will enjoy me talking about the ancillary properties of AI systems or share my enthusiasm.

But for what it’s worth, I enjoyed writing about something that positively surprised me. A property that can show us a tiny glimpse of how the organic model we keep in our skulls may have evolved from the unordered chaos of the world. And I think that’s neat.

Photo by Alina Grubnya on Unsplash

Quite predictably, as we live in the “dumbest era” of Human civilization where everything is “morally charged.” ↩︎
Complex temporal queries (such as, how to minimize the shipping time of a set of items in a logistics problem) require planning, and LLMs cannot plan, as shown by Sébastien Bubeck et al. in their paper “Sparks of artificial general intelligence: Early experiments with gpt-4.” arXiv preprint arXiv:2303.12712 (2023). ↩︎
We can already see some fun experiments on this front. For instance, Golden Gate Calude teaches use how to manipulate Neural Networks to steer the output of the model, but it is also a model for “intrusive thoughts” and “semantic poisoning.” Fascinating stuff. ↩︎

What are LLMs

Note

Note

The Beautiful Things

LLMs from here

This website will always be open. For bots, too.

The Freakout-Free Guide to Generative AI

Machine Consciousness is Inevitable