2024 FFT

Time TBD

Block Coupling and its Correlation with Generalization in LLMs and ResNets

Vardan Papyan (U.Toronto)

Abstract: In this talk, we dive into the internal workings of both Large Language Models and ResNets by tracing input trajectories through model layers and analyzing Jacobian matrices. We uncover a striking phenomenon—block coupling—where the top singular vectors of these Jacobians synchronize across inputs or depth as training progresses. Interestingly, this coupling correlates with better generalization performance. Our findings shed light on the intricate interactions between input representations and suggest new pathways for understanding training dynamics, model generalization, and Neural Collapse.