Show HN: The Hessian of tall-skinny networks is easy to invert
23 points by rahimiali
by MontyCarloHall
1 subcomments
>If the Hessian-vector product is Hv for some fixed vector v, we're interested in solving Hx=v for x. The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.
Silly question, but if you have some clever way to compute the inverse Hessian, why not go all the way and use it for Newton's method, rather than as a preconditioner for SGD?
by Lerc
2 subcomments
I am not a mathematician, but I do enough weird stuff that I encounter things referring to Hessians, yet I don't really know what they are, because everyone who writes about them does so in terms that assumes the reader knows what they are.
Any hints? The Battenburg graphics of matrices?
by jeffjeffbear
1 subcomments
I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?
by petters
1 subcomments
Would be great to see this work continued with some training runs