- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using proc princomp to reduce number of vars in a model.I don't have previous experience with PCA I have a simple question:
Using proc princomps I get the eigenvalues and the coefficients that relate its with raw vars.
I have 30 raw vars and using PCA I can reduce to 7 pca vars (eigenvalues) that keep 95% of datasets variance.
I want to work with this 7pca vars in my model, but now I have a doubt:
Once created the model with the 7 pca vars I want to validate this model with a test dataset. How I get the 7 pca vars in my test datasets?, do I have to use the coefficients obtained in the proc princomp to calculate the 7 pca vars of the test dataset and then apply the model?
Any advice will be greatly appreciated.
- Tags:
- PROC PRINCOMP
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score. My first two steps make an analysis and test data set from the iris data. The third concatenates them and creates the frequency variable. Then you have a data set like I showed you in my first post. Look at the proc print results to see how this works.
data iris;
set sashelp.iris;
if species ne 'Virginica';
run;
data test;
set sashelp.iris;
if species eq 'Virginica';
run;
data all;
set iris(in=i) test;
f = i;
run;
proc print; run;
proc princomp out=scores;
freq f;
run;
proc print; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The easiest way to get scores on additional observations is to create one data set and give the active observations a freq of 1 and the passive observations that you want scored a freq of 0. This example illustrates.
data iris;
set sashelp.iris;
f = species ne 'Virginica';
run;
proc princomp out=scores;
freq f;
run;
proc print; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Warren,
I am sorry , I don't understand what you are doing with freq.
My problem is to work with the pca vars in the test dataset.
If you can explain in more details...I will be grateful
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score. My first two steps make an analysis and test data set from the iris data. The third concatenates them and creates the frequency variable. Then you have a data set like I showed you in my first post. Look at the proc print results to see how this works.
data iris;
set sashelp.iris;
if species ne 'Virginica';
run;
data test;
set sashelp.iris;
if species eq 'Virginica';
run;
data all;
set iris(in=i) test;
f = i;
run;
proc print; run;
proc princomp out=scores;
freq f;
run;
proc print; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Now I undesrtanf. Thanks!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is a PROC VARCLUS you used to reduce the number of variables.