Hello, I am fascinated by the latent variable modeling capability of partial least squares and its extensions (e.g., partial least squares path modeling). I wonder if such a modeling paradigm has applications in survival analyses techniques such as Cox models. I browsed over literatures in survival analysis and found that current amendments to parameter estimate methods centers shrinkage (e.g., ridge regression, LASSO, etc.), without obvious applications of latent variable modeling in this field.
Thank you!
One example found by Google
https://pubmed.ncbi.nlm.nih.gov/24105836/
Also
Partial Least Squares generalized regression covers survival analysis, logistic regression, Cox regression and many other types of analyses, and there is an R package that does this. The logistic partial least squares algorithm is something I have coded in SAS (but its proprietary, owned by my employer and I cannot share it).
One example found by Google
https://pubmed.ncbi.nlm.nih.gov/24105836/
Also
Partial Least Squares generalized regression covers survival analysis, logistic regression, Cox regression and many other types of analyses, and there is an R package that does this. The logistic partial least squares algorithm is something I have coded in SAS (but its proprietary, owned by my employer and I cannot share it).
Thank you very much for your time spent! I have just finished learning the bulk of PLS logistic modeling. But the book section I read gave little information on the way of selecting the number of components. It just briefly said that cross-validation and goodness-of-fit statistics like the AIC and likelihood ratios can be used. Could you recommend more specific methods on that?
Prior to raising my questions here, I had known that PLS logistic regression is a possible choice. But in the field of survival analysis where censoring is common, the "incompatible" nature of logistic regression and all of its generalizations (excluding those that have generalized too far away that have been termed a different name instead of having a suffix "logistic regression", including Cox regression, which is a de facto generalization of conditional logistic regression) with missing data, the quality of the final results are conditional on the quality of imputation. Therefore, I sought to find methods that could handle missing data in other approaches. It is true that while Cox models can handle missing data of the dependent variables without imputation, imputation is a must when it comes to independent variables with missing data, but despite it is not measurable by a number, the dependence of the quality of results on imputation may decrease.
Thank you again!
The PROC PLS documentation contains examples of how to select the number of dimensions using cross validation.
Regarding AIC, Wikipedia explains:
To apply AIC in practice, we start with a set of candidate models, and then find the models' corresponding AIC values. There will almost always be information lost due to using a candidate model to represent the "true model," i.e. the process that generated the data. We wish to select, from among the candidate models, the model that minimizes the information loss. We cannot choose with certainty, but we can minimize the estimated information loss.
Well, I wasn't asking about the definition of cross-validation and AIC. Rather, I was asking the way they can be applied to ascertaining the number of components in partial least squares logistic regression.
Thank you!
@Season wrote:
Well, I wasn't asking about the definition of cross-validation and AIC. Rather, I was asking the way they can be applied to ascertaining the number of components in partial least squares logistic regression.
I didn't give you a definition of these items. Both of my comments above are related to how these statistics can be applied.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.