Programming the statistical procedures from SAS

Issue scoring data using PROC PLS outputs: the scored factors don't match the original factors

Reply
Contributor
Posts: 34

Issue scoring data using PROC PLS outputs: the scored factors don't match the original factors

[ Edited ]

Hi,

I am using PROC PLS to identify orthogonal factors from a set of predictors that has a lot of redundancy (and very high multicollinearity.) PLS works great. The factors and the regression models with those factors are fine.

 

I also have additional observations that were not used when generating the PLS model (e.g. a validation set, new data, etc.) These observations need to be "scored" using the model in order to generate both predictions and the factors for them. Unfortunately, PLS does not produce a "scoring output dataset" but I have a macro that will generate one using the information in these two ODS tables: XVariableCenScale and XWeights.

 

When I use that "scoring output dataset" with PROC SCORE, I can sucessfully reproduce the first factor but not the rest of the factors. There are some differences in the second factor, bigger differences in the third factor, etc. The scored factors seem become worse and worse.

 

Perhaps the assumptions of this macro are not true. I believe that I am using the correct ODS tables because the first factor is correct. Buy why not the other ones? I know PLS builds the factors in an iterative fashion. Maybe there is no way to derive those factors with a simple matrix multiplication (which is what proc score does.)  Maybe the scoring has to be implemented in an iterative
fashion too (e.g. start with scaling/centering the data, then get the first factor... then operate on the remaining data, then get the second factor, etc.)

 

Does anybody has pointers to get this to work right for all factors? or does anybody know an existing procedure that I can use? (I don't want to reinvent the wheel if something is already out there...)

 

I am attaching the code of the macro and an example with simulated data for anybody that wants to reproduce this issue.

 

I would appreciate any contributions!

 

Thank you,
Carlos

 

Contributor
Posts: 34

Re: Issue scoring data using PROC PLS outputs: the scored factors don't match the original factors

[ Edited ]

Found the following comment in the SAS user guide: " You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure." 

(source: http://support.sas.com/documentation/cdl//en/statug/68162/HTML/default/statug_pls_overview01.htm) 

 

I think this refers to scoring new data to generate dependent variable predictions (e.g. the solution.) What I am looking for is not just that, but also how to generate the factors themselves for the new data (using proc score or something else.)  If anybody knows an option to do that, please advice.

 

Thank you,

Carlos

 

Super User
Posts: 18,569

Re: Issue scoring data using PROC PLS outputs: the scored factors don't match the original factors

Wouldn't you run it through PROC Factor? or Proc PRINCOMP?

Contributor
Posts: 34

Re: Issue scoring data using PROC PLS outputs: the scored factors don't match the original factors

Thanks Reeza for the question. Factor Analysis and Principal Components do not take into account the target variable which is key for the problem I am solving. In the past I have used principal components to get orthogonal predictors and then used regression with variable selection to find those components that track with the target variable. I was hoping to use PLS now (since this method does both things: the rotation and the identification of factors that predict well the target.
Ask a Question
Discussion stats
  • 3 replies
  • 337 views
  • 0 likes
  • 2 in conversation