Hypernetworks: Neural Networks for Hierarchical Data
96 points by mkmccjr
by QueensGambit
0 subcomment
Factorization is key here. It separates dataset-level structure from observation-level computation so the model doesn't waste capacity rediscovering structure.
I've been arguing the same for code generation. LLMs flatten parse trees into token sequences, then burn compute reconstructing hierarchy as hidden states. Graph transformers could be a good solution for both: https://manidoraisamy.com/ai-mother-tongue.html
by stephantul
0 subcomment
What a good post! I loved the takeaways at the end of each section.
I think it would maybe get more traction if the code was in pytorch or JAX. It’s been a long while since I’ve seen people use Keras.
by joefourier
2 subcomments
Odd that the author didn’t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. There’s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.
by keepamovin
0 subcomment
This is actually the way to AGI, ngl. Come back when it lands and see that it's right.