In this talk, we present a first-order optimization algorithm for distributed learning problems. We propose an efficient sparsification of proximal-gradient updates to lift the communication bottleneck of exchanging huge size updates between master machine and workers machines. Using a sketch-and-project technique with projections onto near-optimal low-dimensional subspaces, we get a significant performance improvement in terms of convergence with respect to the total size of communication.