FRESH

Hacker News

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

51 points by vforno

by AndReics

1 subcomments

Wow that's really cool i'll definitely check it out! have played around with machine learning algorithms built from scratch in c / cuda too, but once i hit the cuda part of it i kinda just left it to the side. i'm curious how did you use CUDA to optimize the matrix multiplications? how optimized is training, does it take much longer then using pytorch?

0 subcomment

by tdesilva

1 subcomments

Mentioning neural ODE doesn't make sense here, as this is unrelated. Basically any implementation of transformer uses residuals, but you're not really training a neural ODE here.
Also consider getting rid of the em-dashes. I don't know if you mostly vibe-coded this or not, but the README is pretty clearly AI generated.

by ali_chherawalla

1 subcomments

by isatty

1 subcomments

by ericb

1 subcomments

by valentynkit

0 subcomment

by Chu4eeno

3 subcomments

Very weird coding style, did you run astyle --style=python on C code?
Also, your LLM left a comment in the cuda source that it is untested, does the cuda stuff work?