turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PCA with proc princomp

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-29-2017 05:32 PM

Hi,

I am using proc princomp to reduce number of vars in a model.I don't have previous experience with PCA I have a simple question:

Using proc princomps I get the eigenvalues and the coefficients that relate its with raw vars.

I have 30 raw vars and using PCA I can reduce to 7 pca vars (eigenvalues) that keep 95% of datasets variance.

I want to work with this 7pca vars in my model, but now I have a doubt:

Once created the model with the 7 pca vars I want to validate this model with a test dataset. How I get the 7 pca vars in my test datasets?, do I have to use the coefficients obtained in the proc princomp to calculate the 7 pca vars of the test dataset and then apply the model?

Any advice will be greatly appreciated.

Accepted Solutions

Solution

07-29-2017
06:20 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

07-29-2017 06:06 PM

Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score. My first two steps make an analysis and test data set from the iris data. The third concatenates them and creates the frequency variable. Then you have a data set like I showed you in my first post. Look at the proc print results to see how this works.

data iris;
set sashelp.iris;
if species ne 'Virginica';
run;
data test;
set sashelp.iris;
if species eq 'Virginica';
run;
data all;
set iris(in=i) test;
f = i;
run;
proc print; run;
proc princomp out=scores;
freq f;
run;
proc print; run;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

07-29-2017 05:38 PM

The easiest way to get scores on additional observations is to create one data set and give the active observations a freq of 1 and the passive observations that you want scored a freq of 0. This example illustrates.

```
data iris;
set sashelp.iris;
f = species ne 'Virginica';
run;
proc princomp out=scores;
freq f;
run;
proc print; run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to WarrenKuhfeld

07-29-2017 05:56 PM

Thanks Warren,

I am sorry , I don't understand what you are doing with freq.

My problem is to work with the pca vars in the test dataset.

If you can explain in more details...I will be grateful

Solution

07-29-2017
06:20 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

07-29-2017 06:06 PM

Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score. My first two steps make an analysis and test data set from the iris data. The third concatenates them and creates the frequency variable. Then you have a data set like I showed you in my first post. Look at the proc print results to see how this works.

data iris;
set sashelp.iris;
if species ne 'Virginica';
run;
data test;
set sashelp.iris;
if species eq 'Virginica';
run;
data all;
set iris(in=i) test;
f = i;
run;
proc print; run;
proc princomp out=scores;
freq f;
run;
proc print; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to WarrenKuhfeld

07-29-2017 06:20 PM

Now I undesrtanf. Thanks!!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

07-31-2017 09:01 AM

There is a PROC VARCLUS you used to reduce the number of variables.