BookmarkSubscribeRSS Feed
variety
Obsidian | Level 7

I am using PL-DA analysis on plants comparing H-NMR data(independent variables,250) and its biological activity (active=1, inactive=0) to identify the constituents responsible for activity. I want to use the factor loadings score obtained from the PLS-DA analysis to see the importance of relative variables. As my sample size is 12, I will get 12 factors (if I don't use cross validation). My question is which factor to use to best predict which variables are important for outcome? I read in epidemiology research that after obtaining factor scores, we can do logistic regression to see which factor correlates with the outcome. So if I want to use logistic regression, how to use 'Y' value in logistic regression? Because my 'Y' value is for the original observation so how to apply it to the factor score? Also, if my cross validated results shows me to use only 3 factors, how to use 3 factors and  'Y' values for logistic regression? Note: I am using SAS 9.3 version and I am not a statistician. Please provide references if you can. Thank you.

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I'm not following your logic.

 

You are using PLS-DA to classify observations into a finite number of categories ... correct? (That's the normal usage of the method)

 

Then you want to do a logistic regression? With what variable(s) as the response and what variable(s) as the predictors?

 

Where does classifying the observations into categories that you did in PLS-DA come into play in the logistic regression? 

 

How does a Discriminant Analysis and a Logistic Regression relate to one another? Are they using the same variables, or different variables, or some of the same variables and some different variables? Please answer for both X and Y.

 

But ... here's my main point ... if you want to perform some sort of logistic regression as the final analysis, why even bother with PLS-DA? Why not just do the Logistic regression? If you use PLS with a binary Y variable, this has the benefits of PLS but in a logistic regression setting, and more importantly, this method finds the optimal linear combinations of the X variables to predict the Y variable(s). If you use the PLS-DA results, there is no reason to suspect that the scores from PLS-DA will be predictive of the logistic regression binary response variable.

 

And you have a sample size of 12 ... well ... um ...

--
Paige Miller
variety
Obsidian | Level 7

Hi PaigeMiller,

 

First of all thank you for your response. I appreciate it.

 

To answer your questions:

 

I can't use logistic regression on my original set of data. Because my independant varaibles are highly collinear. That is why I am using PLS-DA to deal with multicollinearity. 

 

I am using PLS-DA to extract latent factors. Now assuming these latent factors are orthogonal to each other, I want to use these latent factors as independant variables in logistic regression. I am still not sure about how or what to use as a dependant variable in logistic regression (that was part of my original question).

 

Here is my original idea on using logistic regression after PLS-DA:

PLS-DA gives me number of latent factors equal to the number of observations, if I don't use Cross Validation (In my case 12). Each laten factor gives me a different set of vip scores for the variables. My final goal in performing PLS-DA is to identify variables important for the activity. So I need to identify variables which are high in vip scores. Now my question is: Which factor (of the 12) to choose to identify variables with high vip scores? The vip scores of first factor is different from second factor and so on.

Now if I use Cross-Validation, I get 3 latent factors to use. Again the question raises: Should I use third latent factor vip scores or first or second? 

So I intend to use conditional logistic regression as to determine whether the factor scores obtained by applying PLS-DA were associated with the activity or not? So by identifying the factor that is correlated or associated with the activity, I intend to use the vip scores of the variables of that factor.

 

I got the idea from the paper: https://doi.org/10.1371/journal.pone.0155892 (If you have time, please go through the methods).

 

I hope I am clear in explaining.

 

 

PaigeMiller
Diamond | Level 26

 

I can't use logistic regression on my original set of data. Because my independant varaibles are highly collinear. That is why I am using PLS-DA to deal with multicollinearity.

You use the logistic version of PLS to handle the collinearity. This is quite common when you are performing spectroscopic studies. There's no discriminant analysis needed here.

 

As they say in the game of MONOPOLY: Go directly to Logistic Regression. Do not pass PLS-DA!

--
Paige Miller
variety
Obsidian | Level 7

I don't understand what do you mean by "Logistic version of PLS".  

 

I am new to PLS and SAS so if you can provide me with example or reference, it would be very helpful.

 

Thanks.

PaigeMiller
Diamond | Level 26

PROC PLS with a binary response variable

 

or

 

https://cedric.cnam.fr/fichiers/RC906.pdf

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1932 views
  • 1 like
  • 2 in conversation