AUTHORS |
Rudolf Schill, Maren Klever, Andreas Lösch, Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang, Niko Beerenwinkel |
ABSTRACT |
Cancers evolve by accumulating genetic alterations, such as mutations and copy number changes. The chronological order of these events is important for understanding the disease, but not directly observable from cross-sectional genomic data. Cancer progression models (CPMs), such as Mutual Hazard Networks (MHNs), reconstruct the progression dynamics of tumors by learning a network of causal interactions between genetic events from their co-occurrence patterns. However, current CPMs fail to include effects of genetic events on the observation of the tumor itself and assume that observation occurs independently of all genetic events. Since a dataset contains by definition only tumors at their moment of observation, neglecting any causal effects on this event leads to the “conditioning on a collider” bias: Events that make the tumor more likely to be observed appear anti-correlated, which results in spurious suppressive effects or masks promoting effects among genetic events. Here, we extend MHNs by modeling effects from genetic progression events on the observation event, thereby correcting for the collider bias. We derive an efficient tensor formula for the likelihood function and learn two models on somatic mutation datasets from the MSK-IMPACT study. In colon adenocarcinoma, we find a strong effect on observation by mutations in TP53, and in lung adenocarcinoma by mutations in EGFR. Compared to classical MHNs, this explains away many spurious suppressive interactions and uncovers several promoting effects. |