FRESH

Hacker News

Hypernetworks: Neural Networks for Hierarchical Data

96 points by mkmccjr

by QueensGambit

0 subcomment

Factorization is key here. It separates dataset-level structure from observation-level computation so the model doesn't waste capacity rediscovering structure.
I've been arguing the same for code generation. LLMs flatten parse trees into token sequences, then burn compute reconstructing hierarchy as hidden states. Graph transformers could be a good solution for both: https://manidoraisamy.com/ai-mother-tongue.html

by stephantul

0 subcomment

What a good post! I loved the takeaways at the end of each section.
I think it would maybe get more traction if the code was in pytorch or JAX. It’s been a long while since I’ve seen people use Keras.

by joefourier

2 subcomments

Odd that the author didn’t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. There’s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.

by keepamovin

0 subcomment

This is actually the way to AGI, ngl. Come back when it lands and see that it's right.