BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello,

I saw this code that creates a regression model to use a person's height to predict the person's weight.

I read about defintions of Training dataset and Test(validation) data set.

Training dataset: data used to create the model 

Test(validation) data set : data used to qualify performance(check performance of regression model)

I have 3 questions:

1-Why in the code put the train table and test table together? (As I read  we need to build the model on the train table only)

2-What is the meaning of outest=est (in proc sgplot)?

3-We saved the resgression equation on macro variable and used it to calculate precited values for new observations from train data.

Is it the only way to do it?

If we have 50 independent varaibles then it might be too long code..

 

 

 


data train;
set sashelp.class(keep=weight height);
run;

data test;
 height=71;output;
 height=72;output;
run;

data have;
 set train test;
run;

proc reg data=have outest=est noprint;
model weight=height;
output out=want predicted=predicted;
run;

proc print data=want noobs;run;

data _null_;
set est;
call symputx('func',cats('weight=',intercept,'+',height,'*height'));
run;

proc sgplot data=train aspect=1;
reg x=height y=weight/ cli clm;
inset "&func"/ position=topleft textattrs=graphdata1(size=12);
run;


 

1 REPLY 1
PaigeMiller
Diamond | Level 26

@Ronein wrote:

Hello,

I saw this code that creates a regression model to use a person's height to predict the person's weight.

I read about defintions of Training dataset and Test(validation) data set.

Training dataset: data used to create the model 

Test(validation) data set : data used to qualify performance(check performance of regression model)

I have 3 questions:

1-Why in the code put the train table and test table together? (As I read  we need to build the model on the train table only)

2-What is the meaning of outest=est (in proc sgplot)?

3-We saved the resgression equation on macro variable and used it to calculate precited values for new observations from train data.

Is it the only way to do it?

If we have 50 independent varaibles then it might be too long code..

 

 

 


data train;
set sashelp.class(keep=weight height);
run;

data test;
 height=71;output;
 height=72;output;
run;

data have;
 set train test;
run;

proc reg data=have outest=est noprint;
model weight=height;
output out=want predicted=predicted;
run;

proc print data=want noobs;run;

data _null_;
set est;
call symputx('func',cats('weight=',intercept,'+',height,'*height'));
run;

proc sgplot data=train aspect=1;
reg x=height y=weight/ cli clm;
inset "&func"/ position=topleft textattrs=graphdata1(size=12);
run;


 


  1. The train and test data sets are combined, and then PROC REG will use the train data only to create the regression equation since only the train portion of the data has actual non-missing Y values, but it will give predicted values and confidence intervals on the records in both the train and test data sets.
  2. OUTEST=EST (surely you can look up what a specific option does in the SAS documentation, after all this time you have been using the forum)
  3. Macro variables are the WORST way to save the regression equation, WORST in the sense don't do it because its a lot of effort and there are much simpler ways! The regression equation can be saved in a data set and used for later calculations via PROC SCORE; or they can be saved using the STORE statement in PROC REG and then the predicted values can be computed by PROC PLM. None of this requires you to code the actual prediction equation, SAS handles all of this for you regardless of how many variables are in the regression equation. Look at examples under PROC SCORE or PROC PLM. As far as putting the regression equation in PROC SGPLOT INSET with more than one X variable, this makes no sense. You can't plot the regression with more than one X variable.
--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 494 views
  • 1 like
  • 2 in conversation