Sparse survival models in high-throughput cancer studies
Multivariate Methods & Survival Analysis Themes
Abstract: Sparsity is an essential feature of many contemporary data problems. Many health scans collect a lot of genetic information on patients. In certain cases it is reasonable to assume that the underlying process generating the data is itself sparse, in the sense that only a few of the measured variables are involved in the survival process.
We propose an explicit method of monotonically decreasing sparsity for survival models. In our approach we generalize a so-called equiangular condition in a generalized linear model. Although the geometry involves the Fisher information in a way that is not obvious in the simple regression setting, the equiangular condition turns out to be equivalent with an intuitive condition imposed on the Rao score test statistics. In certain special cases the method can be tweaked to obtain L1-penalized GLM solution paths, but that’s not the point.
The method itself defines sparsity more directly. Although the computation of the solution paths is not trivial, the method compares favorably to other path following algorithms. We show how the method works on a diffuse large-B-cell lymphoma dataset and four high-throughput survival studies of prostate, ovarian, skin and colon cancer, all performed in the last 5 years.