FRESH

Hacker News

A Random Walk in 10 Dimensions (2021)

128 points by just_human

by antognini

3 subcomments

The behavior of a random walk in a high dimensional space can be counter-intuitive. If you take the random walk trajectory and then perform principal components analysis on it, it turns out more than half of the variance is along a single direction. More than 80% is along the first two principal components.
To make matters even more surprising, if you project the random walk trajectory down into these PCA subspaces they are no longer random at all. Instead the trajectory traces a Lissajous curve. (For example see figure 1 of this paper: https://proceedings.neurips.cc/paper/2018/file/7a576629fef88...)

by smokel

2 subcomments

> There is one chance in ten that the walker will take a positive or negative step along any given dimension at each time point.
This confused me a bit. To clarify: at each step, the random walker selects a dimension (with probability 1/10 for any given dimension), and then chooses a direction along that dimension (positive or negative, each with probability 1/2). There are 20 possible moves to choose from at any step.

by MarkusQ

2 subcomments

> On the other hand, a so-called mountain peak would be a 5 surrounded by 4’s or lower. The odds for having this happen in 10D are 0.2*(1-0.8^10) = 0.18. Then the total density of mountain peaks, in a 10D hyperlattice with 5 potential values, is only 18%.
I believe the odds are actually
0.2 (odds of it being a 5) ×
0.8^10 (odds of each of the neighbors being ∈ {1,2,3,4})
which is ~0.021 or around 2%. This makes much more sense, since 18% of the nodes being peaks doesn't sound like they are rare.

by ngriffiths

1 subcomments

> Therefore, despite the insanely large number of adjustable parameters, general solutions, that are meaningful and predictive, can be found by adding random walks around the objective landscape as a partial strategy in combination with gradient descent.
Are there methods that specifically apply this idea?
I guess this is a good explanation for why deep learning isn't just automatically impossible, because if local minima were everywhere then it would be impossible. But on the other hand, usually the goal isn't to add more and more parameters, it's to add just enough so that common features can be identified but not enough to "memorize the dataset." And to design an architecture that is flexible enough but is still quite restricted, and can't represent any function. And of course in many cases (especially when there's less data) it makes sense to manually design transformations from the high dimensional space to a lower dimensional one that contains less noise and can be modeled more easily.
The article feels connected to the manifold hypothesis, where the function we're modeling has some projection into a low dimensional space, making it possible to model. I could imagine a similar thing where if a potential function has lots of ridges, you can "glue it together" so all the level sets line up, and that corresponds with some lower dimensional optimization problem that's easier to solve. Really interesting and I found it super clearly written.

by lordnacho

1 subcomments

Tangentially related:
https://www.youtube.com/watch?v=iH2kATv49rc
Turns out there is a very interesting theorem by Polya about random walks that separate 1 or 2 dimensional random walks from higher dimensional ones. I thought I'd link this video, because it's so well done.

by curtisszmania

0 subcomment