Help using Base SAS procedures

Existence of model-applying procedure

Reply
New Contributor
Posts: 2

Existence of model-applying procedure

I'm using proc reg to create a linear regression model and am wondering if there is a procedure that will create and apply the equation that is created by the model so that I do not have to. My problem is that I have about 1,000 variables, and the forward selection model spits out about 300 in the model. That's a lot to type! I also have about 300,000 records that I need to score with this model.

Any help would be greatly appreciated (even if it is "sorry, that doesn't exist").

Thanks!

Becky
Frequent Contributor
Posts: 139

Re: Existence of model-applying procedure

Becky

Three items to discuss
-1) Model diagnostics: if you want to score your modeling datasets see the output p=phat r=residuals options in the documentation for PROC REG.
-2) Validation: Once your model is done output the betas from the OUTEST option in the PROC REG statement. Then use PROC SCORE to score your file. PROC SCORE can be tricky so see the documentation for details and examples.

Since you have some many predictor variables (see #3 below on that) you will need to dynamically build the VAR statement in PROC SCORE.

I modified the example in the PROC SCORE documentation to do this

data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse @@;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
run;
proc reg data=Fitness outest=RegOut;
OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse;
output p=phat;
title 'REGRESSION SCORING EXAMPLE';
run;



proc print data=RegOut;
title2 'OUTEST= Data Set from PROC REG';
run;


proc print data=RScoreP;
title2 'Predicted Scores for Regression';
run;

proc score data=Fitness score=RegOut out=RScoreR type=parms;
var Oxygen Age Weight RunTime RunPulse RestPulse;
run;

proc print data=RScoreR;
title2 'Negative Residual Scores for Regression';
run;


* to dynamically only use the variables that you want for scoring;
* modified from PROC SCORE example;
%macro scoreit();

proc contents data=RegOut out=Betas noprint;
run;

data _null_;
set Betas end=eof;
where type=1 and upcase(name) not in('OXYGEN','INTERCEPT','_RMSE_');
* type=1 means numeric variables;
* we do not want the y, intercept, or RMSE to be used as the betas;

call symput('var'||strip(put(_n_,8.)),strip(name));
if eof then call symput('numVars', strip(put(_n_,8.)));

run;


proc score data=Fitness score=RegOut out=RScoreP type=parms;
var %do i=1 %to &numvars;
&&var&i
%end;;
run;

%mend scoreit;
%scoreit;


http://support.sas.com/onlinedoc/913/docMainpage.jsp

-3) Now for your biggest problem. Having 1000 predictor variables is extreme. I'm not sure what you are modeling but using that many variables in a model will cause overfitting and instability. Each variable adds a dimension and with 300-1000 the "curse of dimensionality" will most likely occur.

-Darryl
New Contributor
Posts: 2

Re: Existence of model-applying procedure

Thanks, Darryl.

I was able to use Proc Score with the macro you spelled out above. I appreciate your help.

Becky
N/A
Posts: 0

Re: Existence of model-applying procedure

One of the key features of Darryl's solution is the "outest=" option for "proc reg". It defines a SAS dataset that will hold the results, the estimated/derived parameters. This is what then can be used to automate the application of the model through another proc that is designed to use it, or within your own Data step and/or macro.
Ask a Question
Discussion stats
  • 3 replies
  • 109 views
  • 0 likes
  • 3 in conversation