BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juanvg1972
Pyrite | Level 9

Hi,

 

I am using proc princomp to reduce number of vars in a model.I don't have previous experience with PCA I have a simple question:

Using proc princomps I get the eigenvalues and the coefficients that relate its with raw vars.

I have 30 raw vars and using PCA I can reduce to 7 pca vars (eigenvalues) that keep 95% of datasets variance. 

I want to work with this 7pca vars in my model, but now I have a doubt:

Once created the model with the 7 pca vars I want to validate this model with a test dataset. How I get the 7 pca vars in my test datasets?, do I have to use the coefficients obtained in the proc princomp to calculate the 7 pca vars of the test dataset and then apply the model?

 

Any advice will be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
WarrenKuhfeld
Rhodochrosite | Level 12

Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score.  My first two steps make an analysis and test data set from the iris data.  The third concatenates them and creates the frequency variable.  Then you have a data set like I showed you in my first post.  Look at the proc print results to see how this works.


data iris; set sashelp.iris; if species ne 'Virginica'; run; data test; set sashelp.iris; if species eq 'Virginica'; run; data all; set iris(in=i) test; f = i; run; proc print; run; proc princomp out=scores; freq f; run; proc print; run;

View solution in original post

5 REPLIES 5
WarrenKuhfeld
Rhodochrosite | Level 12

The easiest way to get scores on additional observations is to create one data set and give the active observations a freq of 1 and the passive observations that you want scored a freq of 0.  This example illustrates.

data iris;
  set sashelp.iris;
  f = species ne 'Virginica';
  run;

proc princomp out=scores;
   freq f;
run;

proc print; run;
juanvg1972
Pyrite | Level 9

Thanks Warren,

 

I am sorry , I don't understand what you are doing with freq.

My problem is to work with the pca vars in the test dataset.

 

If you can explain in more details...I will be grateful

 

 

WarrenKuhfeld
Rhodochrosite | Level 12

Imagine you have two data sets. You want to fit a model on iris and get scores on test. So you concatenate them and make a frequency variable f, that has values of 1 for the observations that you want to use to fit the model and 0 for the passive observations you want to score.  My first two steps make an analysis and test data set from the iris data.  The third concatenates them and creates the frequency variable.  Then you have a data set like I showed you in my first post.  Look at the proc print results to see how this works.


data iris; set sashelp.iris; if species ne 'Virginica'; run; data test; set sashelp.iris; if species eq 'Virginica'; run; data all; set iris(in=i) test; f = i; run; proc print; run; proc princomp out=scores; freq f; run; proc print; run;
juanvg1972
Pyrite | Level 9

Now I undesrtanf. Thanks!!!

Ksharp
Super User

There is a PROC VARCLUS  you used to reduce the number of variables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1725 views
  • 1 like
  • 3 in conversation