However, it's pretty obvious that they are related since CCA is (or should be) well-known to be among the original unsupervised learning algorithms. It's the progenitor of the field. It works, it always did. Just like logistic regression for classification. Deep learning is about putting in huge computational effort for the extra few percent.
This is like saying that Gauss deserves the credit for LLMs because he came up with least-squares regression, which was the progenitor of supervised learning. Yes, there is a chain of discoveries leading back, but when you give the credit that far back, it's just insulting to the hard work that came inbetween.
Gauss and Hotelling are famous enough as it is.
(Before anyone asks, I'm not shilling for JEPA, I just think this argument is reductive for all of unsupervised and semi-supervised learning.)
OTOH prediction doesn't necessarily reflect causation either, but prediction is what JEPA is about, how our brain/intelligence works, and one of the great confirmations of LLMs is how powerful prediction errors are as a learning signal.
JEPA appears a step in the right direction of trying to build a brain rather than a language model - to use prediction the way the brain uses it to predict the future (not an historical frozen training set), and learn a real world model of how the world behaves. Any JEPA implementations I've read about use a Transformer as their predictive component since even prediction (and certainly not correlation) is not where JEPA is innovating - it is more about applying prediction to the right problem (assuming the goal is to implement animal/human intelligence) of predicting sensory inputs at the right level of representation.
A recent JEPA variant, Causal-JEPA, moves beyond just infilling to predict object state from object interactions (i.e. to learn causal predictive relationships).