FRESH

Hacker News

Show HN: Tiny Diffusion – A character-level text diffusion model from scratch

152 points by nathan-barry

by simonw

0 subcomment

This is really neat.
I noticed the diffusion-process.py demo was using matplotlib in a window, but I figured it would be cute if it used a terminal UI instead - so I had Claude Code convert it to use curses. Code and demo GIF here: https://gist.github.com/simonw/9033ebd8dd17b4c0ad101ddda7a54...

by mlmonkey

0 subcomment

I'm curious: has there been any work done on generating embedding vectors instead of discrete tokens via diffusion? What would that look like? Please point me to some references. Thanks!

by yugretcx

4 subcomments

Why do these text diffusion demos always look like the number of allowed tokens is fixed for a specific unfilled region?
Is this the case?
Ie. if the region only has four tokens(here characters) but calculates the best word is “forget” does it just abandon the best fit or truncate it to fit?
Are there text diffusion models with lax infill directives?

by Majromax

1 subcomments

The basic MLP block in this model uses a ReLU^2 activation function (x <- ReLU(x)^2). That seems to be copied from the nanochat project, and it's not present in nanoGPT. Is there some documentation on the choice of this activation function?

by gdiamos

2 subcomments

One year later and there is still no inference engine for diffusion LLMs
Students looking for a project to break into AI - please!

by embedding-shape

1 subcomments

Fun project, easy to understand and nice looking results, everything one could ask for! I played around with it locally, did some optimizations of low hanging fruits without making it much more complicated, and was gonna send over a PR. But then I noticed there is no license attached to the project. What are your plans regarding the licensing for this?

by volodia

0 subcomment

There is also this one that was released in October: https://github.com/kuleshov/char-mdlm

by tell_me_whai

0 subcomment

Looks fun, thanks for sharing. I see you're implementing game of life sampling, what's the reasoning for using this logic?

by doppelgunner

0 subcomment

0 subcomment