We present an asynchronous optimization algorithm for distributed learning, that efficiently reduces the communications between a master and working machines by randomly sparsifying the local updates. This sparsification allows to lift the communication bottleneck often present in distributed learning setups where computations are performed by workers on local data while a master machine coordinates their updates to optimize a global loss. We prove that despite its sparse asynchronous communications, our algorithm allows for a fixed stepsize and benefits from a linear convergence rate in the strongly convex case. Moreover, for $\ell_1$-regularized problems, this algorithm identifies near-optimal sparsity patterns, so that all communications eventually become sparse. We furthermore leverage on this identification to improve our sparsification technique. We illustrate on real and synthetic data that this algorithm converges faster in terms of data exchanges.