Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Issue scoring data using PROC PLS outputs: the scored factors don't ma...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-04-2016 08:14 PM
(1611 views)

Hi,

I am using PROC PLS to identify orthogonal factors from a set of predictors that has a lot of redundancy (and very high multicollinearity.) PLS works great. The factors and the regression models with those factors are fine.

I also have additional observations that were not used when generating the PLS model (e.g. a validation set, new data, etc.) These observations need to be "scored" using the model in order to generate both predictions and the factors for them. Unfortunately, PLS does not produce a "scoring output dataset" but I have a macro that will generate one using the information in these two ODS tables: XVariableCenScale and XWeights.

When I use that "scoring output dataset" with PROC SCORE, I can sucessfully reproduce the first factor but not the rest of the factors. There are some differences in the second factor, bigger differences in the third factor, etc. The scored factors seem become worse and worse.

Perhaps the assumptions of this macro are not true. I believe that I am using the correct ODS tables because the first factor is correct. Buy why not the other ones? I know PLS builds the factors in an iterative fashion. Maybe there is no way to derive those factors with a simple matrix multiplication (which is what proc score does.) Maybe the scoring has to be implemented in an iterative

fashion too (e.g. start with scaling/centering the data, then get the first factor... then operate on the remaining data, then get the second factor, etc.)

Does anybody has pointers to get this to work right for all factors? or does anybody know an existing procedure that I can use? (I don't want to reinvent the wheel if something is already out there...)

I am attaching the code of the macro and an example with simulated data for anybody that wants to reproduce this issue.

I would appreciate any contributions!

Thank you,

Carlos

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Found the following comment in the SAS user guide: " You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure."

(source: http://support.sas.com/documentation/cdl//en/statug/68162/HTML/default/statug_pls_overview01.htm)

I think this refers to scoring new data to generate dependent variable predictions (e.g. the solution.) What I am looking for is not just that, but also how to generate the factors themselves for the new data (using proc score or something else.) If anybody knows an option to do that, please advice.

Thank you,

Carlos

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Wouldn't you run it through PROC Factor? or Proc PRINCOMP?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Reeza for the question. Factor Analysis and Principal Components do not take into account the target variable which is key for the problem I am solving. In the past I have used principal components to get orthogonal predictors and then used regression with variable selection to find those components that track with the target variable. I was hoping to use PLS now (since this method does both things: the rotation and the identification of factors that predict well the target.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.