Multi-view Cluster Analysis with Incomplete Data

Multi-view Cluster Analysis with Incomplete Data

Abstract.

Multi-view data are commonly encountered in many scientific domains where data from different modalities is collected in each subject. Frequently, data entries can be missing from some of the views. Cluster analysis that aims to partition subjects into consistent clusters across different views is especially difficult when incomplete data are present in any of the views. The latest multi-view co-clustering methods can find the same sample partition in different views of the sample, but they do not work with incomplete data. We propose an enhanced formulation for a family of multi-view co-clustering methods by introducing an auxiliary matrix where each element indicates the probability that a corresponding data entry is observed, thus capable of co-clustering incomplete multi-view data. In comparison with the simple strategy of removing subjects with missing entries, our approach can use all available data in cluster analysis. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to the imputation uncertainty that would have been especially severe in medical treatment study data. We validated the proposed method in simulations by comparing to the state of the art, and then applied this approach to a treatment study of heroin dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in the treatment study were naturally assessed in different feature spaces such as in the pre-, during- and post-treatment time windows. Our algorithm was able to identify patient subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the recognition of pre-treatment (baseline) features predictive of post-treatment outcomes.

Click here to download the software package.
Click here to download the supplemental table.

The related paper was published in Information Science vol 494, pp. 278-293, 2019.

Guoqing Chao, Jiangwen Sun, Jin Lu, Jinbo Bi

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA

An-Li Wang, Daniel D. Langleben

University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.

This is an open source program for non-commercial use only. Please contact either Dr. Jinbo Bi (jinbo.bi@uconn.edu) or Guoqing Chao (guoqing.chao@uconn.edu) for on-going progress.

Contact Jinbo Bi (jinbo.bi@uconn.edu) for information about this page.