Paper_sgd_sparse_features
Our paper SGD with large step sizes learns sparse features is available online! TL;DR: loss stabilization achieved via SGD with large step sizes leads to a hidden dynamics that promotes sparse feature learning. Also see this twitter thread for a quick summary of the main ideas.
