Markov Models are anything that has state and emit tokens based only on its current state and undergoes a state transition. The token emission and state transitions are usually probabilistic -- a statistical/probabilistic analogue of a state machine. The deterministic state machine is a special case where the transition probabilities are degenerate (concentrated at an unique point).
For a Markov Model to be non-vacuous, non-vapid discussion point, however, one needs to specify very precisely the relationships allowed between state and tokens/observations, whether it's hidden or visible, discrete or continuous, fixed context length or variable context length, causal or non causal ...
The simplest such model is one where the state is a specified, computable function of the last k observations. One such simple function is the identity function -- the state then is the last k tokens. This is called a k order Markov Chain and is a restriction of the bigger class -- Markov Models.
One can make the state a specified, computable function of (k) previous states and k most recent tokens/observations. (Equivalently RNNs)
The functions may be specified only upto a class of computable functions, finite or infinite in size. They may be stochastic in the sense they define only the state transition probabilities.
You can make the context length a computable function of the k most recent observations (therefore they can be of varying length), but you have to ensure that the contexts are always full for this model to be well defined.
Context length can be a computable function of both the (el) most recent states and k most recent observations.
Crazy ones emit more than one token based on current state.
On and on.
Not all Markov Models are learnable.
> Itheve whe oiv v f vidleared ods alat akn atr. s m w bl po ar 20
Using pairs of consecutive characters (order-2 Markov model) helps, but not much:
> I hateregratics.pyth fwd-i-sed wor is wors.py < smach. I worgene arkov ment by compt the fecompultiny of 5, ithe dons
Triplets (order 3) are a bit better:
> I Fed tooks of the say, I just train. All can beconsist answer efferessiblementate
> how examples, on 13 Debian is the more M-x: Execute testeration
LLMs usually do some sort of tokenization step prior to learning parameters. So I decided to try out order-1 Markov models over text tokenized with byte pair encoding (BPE).
Trained on TFA I got this:
> I Fed by the used few 200,000 words. All comments were executabove. This value large portive comment then onstring takended to enciece of base for the see marked fewer words in the...
Then I bumped up the order to 2
> I Fed 24 Years of My Blog Posts to a Markov Model
> By Susam Pal on 13 Dec 2025
>
> Yesterday I shared a little program calle...
It just reproduced the entire article verbatim. This makes sense as BPE removes any pair of repeated tokens, making order-2 Markov transitions fully deterministic.
I've heard that in NLP applications, it's very common to run BPE only up to a certain number of different tokens, so I tried that out next.
Before limiting, BPE was generating 894 tokens. Even adding a slight limit (800) stops it from being deterministic.
> I Fed 24 years of My Blog Postly coherent. We need to be careful about not increasing the order too much. In fact, if we increase the order of the model to 5, the generated text becomes very dry and factual
It's hard to judge how coherent the text is vs the author's trigram approach because the text I'm using to initialize my model has incoherent phrases in it anyways.
Anyways, Markov models are a lot of fun!
I used it as a kind of “dream well” whenever I wanted to draw some muse from the same deep spring. It felt like a spiritual successor to what I used to do as a kid: flipping to a random page in an old 1950s Funk & Wagnalls dictionary and using whatever I found there as a writing seed.
Here’s a link: https://botnik.org/content/harry-potter.html
Iirc there was some research on "infini-gram", that is a very large ngram model, that allegedly got performance close to LLMs in some domains a couple years back
Usage:
hailo -t corpus.txt -b brain.brn
Where "corpus.txt" should be a file with one sentence per line.
Easy to do under sed/awk/perl. hailo -b brain.brn
This spawns the chatbot with your trained brain.By default Hailo chooses the easy engine. If you want something more "realistic", pick the advanced one mentioned at 'perldoc hailo' with the -e flag.
Giving 24 years of your experience, thoughts and life time to us.
This is special in these times of wondering, baiting and consuming only.
https://archive.org/details/Babble_1020
A fairly prescient example of how long ago 4 years was:
https://forum.winworldpc.com/discussion/12953/software-spotl...
cpanm -n local::lib
cpanm -n Hailo
~/perl5/bin/hailo -E Scored -t corpus.txt -b brain.brn
~/perl5/bin/hailo -b brain.brn
As corpus.txt, you can use a Perl/sed command for instance with
book from Gutenberg.I forgot to put the '-E' flag in my previous comments, so here it is. It's to select a more 'complex' engine, so the text output looks less gibberish.
npm package of the markov model if you just want to play with it on localhost/somewhere else: https://github.com/Aperocky/weighted-markov-generator
Except instead we fine-tuned GPT-2 instead. (As was the fashion at the time!)
We used this one, I think https://github.com/minimaxir/gpt-2-simple
I think it took 2-3 hours on my friend's Nvidia something.
The result was absolutely hilarious. It was halfway between a markov chain and what you'd expect from a very small LLM these days. Completely absurd nonsense, yet eerily coherent.
Also, it picked up enough of our personality and speech patterns to shine a very low resolution mirror on our souls...
###
Andy: So here's how you get a girlfriend:
1. Start making silly faces
2. Hold out your hand for guys to swipe
3. Walk past them
4. Ask them if they can take their shirt off
5. Get them to take their shirt off
6. Keep walking until they drop their shirt
Andy: Can I state explicitly this is the optimal strategy