To make matters even more surprising, if you project the random walk trajectory down into these PCA subspaces they are no longer random at all. Instead the trajectory traces a Lissajous curve. (For example see figure 1 of this paper: https://proceedings.neurips.cc/paper/2018/file/7a576629fef88...)
This confused me a bit. To clarify: at each step, the random walker selects a dimension (with probability 1/10 for any given dimension), and then chooses a direction along that dimension (positive or negative, each with probability 1/2). There are 20 possible moves to choose from at any step.
I believe the odds are actually
0.2 (odds of it being a 5) ×
0.8^10 (odds of each of the neighbors being ∈ {1,2,3,4})
which is ~0.021 or around 2%. This makes much more sense, since 18% of the nodes being peaks doesn't sound like they are rare.
Are there methods that specifically apply this idea?
I guess this is a good explanation for why deep learning isn't just automatically impossible, because if local minima were everywhere then it would be impossible. But on the other hand, usually the goal isn't to add more and more parameters, it's to add just enough so that common features can be identified but not enough to "memorize the dataset." And to design an architecture that is flexible enough but is still quite restricted, and can't represent any function. And of course in many cases (especially when there's less data) it makes sense to manually design transformations from the high dimensional space to a lower dimensional one that contains less noise and can be modeled more easily.
The article feels connected to the manifold hypothesis, where the function we're modeling has some projection into a low dimensional space, making it possible to model. I could imagine a similar thing where if a potential function has lots of ridges, you can "glue it together" so all the level sets line up, and that corresponds with some lower dimensional optimization problem that's easier to solve. Really interesting and I found it super clearly written.
https://www.youtube.com/watch?v=iH2kATv49rc
Turns out there is a very interesting theorem by Polya about random walks that separate 1 or 2 dimensional random walks from higher dimensional ones. I thought I'd link this video, because it's so well done.