BookmarkSubscribeRSS Feed
carlosmirandad
Obsidian | Level 7

Hi,

I am using PROC PLS to identify orthogonal factors from a set of predictors that has a lot of redundancy (and very high multicollinearity.) PLS works great. The factors and the regression models with those factors are fine.

 

I also have additional observations that were not used when generating the PLS model (e.g. a validation set, new data, etc.) These observations need to be "scored" using the model in order to generate both predictions and the factors for them. Unfortunately, PLS does not produce a "scoring output dataset" but I have a macro that will generate one using the information in these two ODS tables: XVariableCenScale and XWeights.

 

When I use that "scoring output dataset" with PROC SCORE, I can sucessfully reproduce the first factor but not the rest of the factors. There are some differences in the second factor, bigger differences in the third factor, etc. The scored factors seem become worse and worse.

 

Perhaps the assumptions of this macro are not true. I believe that I am using the correct ODS tables because the first factor is correct. Buy why not the other ones? I know PLS builds the factors in an iterative fashion. Maybe there is no way to derive those factors with a simple matrix multiplication (which is what proc score does.)  Maybe the scoring has to be implemented in an iterative
fashion too (e.g. start with scaling/centering the data, then get the first factor... then operate on the remaining data, then get the second factor, etc.)

 

Does anybody has pointers to get this to work right for all factors? or does anybody know an existing procedure that I can use? (I don't want to reinvent the wheel if something is already out there...)

 

I am attaching the code of the macro and an example with simulated data for anybody that wants to reproduce this issue.

 

I would appreciate any contributions!

 

Thank you,
Carlos

 

3 REPLIES 3
carlosmirandad
Obsidian | Level 7

Found the following comment in the SAS user guide: " You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure." 

(source: http://support.sas.com/documentation/cdl//en/statug/68162/HTML/default/statug_pls_overview01.htm) 

 

I think this refers to scoring new data to generate dependent variable predictions (e.g. the solution.) What I am looking for is not just that, but also how to generate the factors themselves for the new data (using proc score or something else.)  If anybody knows an option to do that, please advice.

 

Thank you,

Carlos

 

Reeza
Super User

Wouldn't you run it through PROC Factor? or Proc PRINCOMP?

carlosmirandad
Obsidian | Level 7
Thanks Reeza for the question. Factor Analysis and Principal Components do not take into account the target variable which is key for the problem I am solving. In the past I have used principal components to get orthogonal predictors and then used regression with variable selection to find those components that track with the target variable. I was hoping to use PLS now (since this method does both things: the rotation and the identification of factors that predict well the target.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1568 views
  • 0 likes
  • 2 in conversation