Becky
Three items to discuss
-1) Model diagnostics: if you want to score your modeling datasets see the output p=phat r=residuals options in the documentation for PROC REG.
-2) Validation: Once your model is done output the betas from the OUTEST option in the PROC REG statement. Then use PROC SCORE to score your file. PROC SCORE can be tricky so see the documentation for details and examples.
Since you have some many predictor variables (see #3 below on that) you will need to dynamically build the VAR statement in PROC SCORE.
I modified the example in the PROC SCORE documentation to do this
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse @@;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
run;
proc reg data=Fitness outest=RegOut;
OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse;
output p=phat;
title 'REGRESSION SCORING EXAMPLE';
run;
proc print data=RegOut;
title2 'OUTEST= Data Set from PROC REG';
run;
proc print data=RScoreP;
title2 'Predicted Scores for Regression';
run;
proc score data=Fitness score=RegOut out=RScoreR type=parms;
var Oxygen Age Weight RunTime RunPulse RestPulse;
run;
proc print data=RScoreR;
title2 'Negative Residual Scores for Regression';
run;
* to dynamically only use the variables that you want for scoring;
* modified from PROC SCORE example;
%macro scoreit();
proc contents data=RegOut out=Betas noprint;
run;
data _null_;
set Betas end=eof;
where type=1 and upcase(name) not in('OXYGEN','INTERCEPT','_RMSE_');
* type=1 means numeric variables;
* we do not want the y, intercept, or RMSE to be used as the betas;
call symput('var'||strip(put(_n_,8.)),strip(name));
if eof then call symput('numVars', strip(put(_n_,8.)));
run;
proc score data=Fitness score=RegOut out=RScoreP type=parms;
var %do i=1 %to &numvars;
&&var&i
%end;;
run;
%mend scoreit;
%scoreit;
http://support.sas.com/onlinedoc/913/docMainpage.jsp
-3) Now for your biggest problem. Having 1000 predictor variables is extreme. I'm not sure what you are modeling but using that many variables in a model will cause overfitting and instability. Each variable adds a dimension and with 300-1000 the "curse of dimensionality" will most likely occur.
-Darryl
... View more