Scalable and Robust Sparse Subspace Clustering Using Randomized Clustering and Multilayer Graphs.
Sparse subspace clustering (SSC) is one of the current state-of-the-artmethod for partitioning data points into the union of subspaces, with strongtheoretical guarantees. However, it is not practical for large data sets as itrequires solving a LASSO problem for each data point, where the number ofvariables in each LASSO problem is the number of data points. To improve thescalability of SSC, we propose to select a few sets of anchor points using arandomized hierarchical clustering method, and, for each set of anchor points,solve the LASSO problems for each data point allowing only anchor points tohave a non-zero weight (this reduces drastically the number of variables). Thisgenerates a multilayer graph where each layer corresponds to a different set ofanchor points. Using the Grassmann manifold of orthogonal matrices, the sharedconnectivity among the layers is summarized within a single subspace. Finally,we use $k$-means clustering within that subspace to cluster the data points,similarly as done by spectral clustering in SSC. We show on both synthetic andreal-world data sets that the proposed method not only allows SSC to scale tolarge-scale data sets, but that it is also much more robust as it performssignificantly better on noisy data and on data with close susbspaces andoutliers, while it is not prone to oversegmentation.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.