The math behind ChatGPT

A Russian mathematician invented the math behind ChatGPT in 1906 while trying to humiliate a priest in an academic feud, and he died 16 years later without knowing any of it.

His name was Andrey Markov. His nickname was Andrey the Furious. And the thing he built was never meant to be about language at all.

Here is the story almost nobody tells you.

Russia in 1905 was fracturing. The Russo-Japanese War was bleeding the country. Revolution was in the streets. And inside the Imperial Academy of Sciences, two mathematicians were tearing each other apart over a question that had nothing to do with either of them professionally.

The priest was Pavel Nekrasov, a theologian turned mathematician who believed numbers could prove God’s design. His argument was this: the Law of Large Numbers, the foundational rule of probability theory, only works when events are independent of each other. Like coin flips. No connection between them. And if human decisions follow the same pattern, he said, then human beings must be making truly free, independent choices. Mathematics, in his telling, proved free will. Which meant it proved the soul. Which meant it proved God.

Markov found this professionally offensive and personally infuriating.

He was a fierce atheist who had been excommunicated from the Russian Orthodox Church by choice, sending a letter demanding they remove him after they refused to recognize Tolstoy’s excommunication. He had no patience for what he called the abuse of mathematics. The idea that a priest was using probability theory to smuggle theology into science made him furious in the precise way his nickname suggested.

So he set out to destroy the argument.

His proof was elegant and brutal. He showed that the Law of Large Numbers does not require independence at all. Averages can stabilize even when every event is connected to the one before it. Free will had nothing to do with it. The soul had nothing to do with it. Nekrasov’s entire theological superstructure collapsed on a mathematical technicality.

But Markov needed a real-world demonstration. Something concrete. Something that would make the proof undeniable.

He picked up a copy of Alexander Pushkin’s Eugene Onegin.

Not to read it. To count it.

He sat in his study in St. Petersburg and wrote out the first 20,000 letters of the poem in one continuous string, stripping out every space and every punctuation mark until it was just a raw chain of characters. Then he began counting. Vowel or consonant. What follows what. How often does a vowel follow a vowel. How often does a consonant follow a vowel. Week after week, letter by letter, by hand.

What he found was that the letters were deeply dependent on each other. A vowel is far more likely to follow a consonant than to follow another vowel. The sequence is not random. Each letter is influenced by what came before it. And yet across 20,000 letters, the overall frequency of vowels converged to a stable number. Dependence and statistical regularity could coexist.

Nekrasov was wrong. The math worked without independence. Free will was not hiding inside probability theory. Markov had proven it on the back of a love poem.

He called the structure he had discovered a chain. What we now call a Markov chain.

The idea is simple enough to explain in one sentence. The next state of a system depends only on its current state, not on everything that came before it. Each step carries just enough memory to take the next step. No more.

What Markov could not have imagined is what that idea would become.

Every language model that exists today is built on this exact logic. When ChatGPT reads your prompt and generates the next word, it is doing a vastly more sophisticated version of exactly what Markov did with Pushkin’s letters. It looks at the current state of the conversation and calculates what should come next based on patterns in everything it was trained on. The core mathematical intuition, that sequences have structure, that the next element depends on what came before, that you can model language as a chain of dependent probabilities, is Markov’s. It has been Markov’s since 1913.

His paper on Eugene Onegin was presented to the Imperial Academy of Sciences on January 23, 1913. The audience was mathematicians. The context was a dispute about free will. Nobody in that room was thinking about computers. There were no computers. The first electronic computer would not exist for another three decades.

He died in 1922, nine years after the paper, in the early chaos of the Soviet era. He was 66. He had spent his final years watching the Tsar fall, the revolution rise, and his country become something unrecognizable. He never saw a transistor. He never imagined a machine that processes language. He thought he had settled an argument with a priest.

The argument he actually settled was one nobody had asked yet.

Today his chains are inside every search engine, every voice assistant, every spam filter, every autocomplete. The 2024 paper Large Language Models as Markov Chains shows formally what practitioners have known informally for decades: the inference mechanism of GPT-4, Claude, and Gemini can be characterized as a Markov chain operating over sequences of tokens. The math is his. The name on the paper is someone else’s.

There is a version of this story where Nekrasov wins the argument. Where Markov decides the priest is not worth his time. Where nobody counts 20,000 letters in a poem to settle a theological dispute.

In that version, the chain is never invented. Or it is invented later, by someone else, for different reasons, on a different timeline.

We got this version instead. The furious atheist. The love poem. The weeks of counting. The proof that destroyed a man’s theology and accidentally handed the 21st century its most important mathematical tool.

Nekrasov wanted to find God in the numbers.

What he found instead was Markov. And Markov found something neither of them was looking for.

Source: Ihtesham Ali

Comments

One response to “The math behind ChatGPT”

Leave a Reply

Your email address will not be published. Required fields are marked *

More posts