Collaborative Phenotype Inference from Comorbid Substance Use Disorders and Genotypes
Incomplete data commonly exhibit in large-scale genetic studies of complex human disease, such as substance use disorders. Despite great progress in genotype imputation, e.g., the IMPUTE2 method, considerably less progress has been made in meaningful inference of phenotypes. We design a novel approach that integrates features of comorbid conditions and individual’s genotypes to infer missing (unreported) symptoms of a disorder. The premise of our approach lies in the symptom correlations and shared biological basis of concurrent disorders such as dependence on cocaine versus opioids. A matrix completion method is adapted in this paper to construct a bi-linear model based on interactions of genotypes and known symptoms of related disorders to infer unknown values of another set of symptoms or phenotypes.We use this approach to infer substance use behavior from candidate genotypes (with preliminary evidence for association with substance use disorder) and known similarities between the behaviors that drug abusers use two illicit drugs. Our sparse model is also capable of identifying the genotype-behavior interactions that are most relevant to the imputation of a drug abuse symptom. An efficient stochastic and parallel algorithm based on the linearized alternating direction method of multipliers is developed to solve the proposed optimization problem. Careful empirical evaluations of our approach in comparison with other advanced data matrix completion methods in both simulations and a case study show that our approach not only significantly improves imputation accuracy but also scales up to a much better computational efficiency.
Click here to download the software package.
Jin Lu, Jiangwen Sun, Xinyu Wang, Henry R. Kranzlery, Joel Gelernterz and Jinbo Bi
Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 2017.