PROC REG DATA=WORK.D201; MODEL Average_daily_dose_during_the_in = Age__y_ Gender_code BSA AF Hypertension CHF Hypoalbuminemia_code AKI_for_T_test Potential_amiodarone_DDI T_test___indication VAR33 VAR34 AKI_2C9 AKI_VKORC1 / seleciton=stepwise SLE=0.05 SLS=0.20 vif clb clm cli; STORE WORK.DOSEMODEL / LABEL='Linear Regression'; RUN; PROC PLM RESTORE=WORK.DOSEMODEL ALPHA=0.05; SCORE DATA=WORK.PREDICTED out=WORK.NEWDOSE predicted lclm uclm; RUN; PROC PRINT DATA=WORK.NEWDOSE; VAR Age__y_ Gender_code BSA AF Hypertension CHF Hypoalbuminemia_code AKI_for_T_test Potential_amiodarone_DDI T_test___indication VAR33 VAR34 AKI_2C9 AKI_VKORC1 predicted lclm uclm; RUN;
Just want to compute the confidence interval of dependent variable for new observations based on the result of a linear regression model. But no luck.
The item store in PROC REG will only generate the predicted observations and not intervals in the PLM procedure. The item store from PROC REG only stores the parameter estimates so only a subset options are available. If you have a continuous response use the GLM procedure to estimate the model and then PLM procedure to obtain predicted observations and intervals for the new data set. For example,
/* SAS CODE FOLLOWS */
data fitness;
input age weight oxygen runtime restpulse runpulse maxpulse;
datalines;
44 89.47 44.609 11.37 62 178 182
40 75.07 45.313 10.07 62 185 185
44 85.84 54.297 8.65 45 156 168
42 68.15 59.571 8.17 40 166 172
38 89.02 49.874 9.22 55 178 180
47 77.45 44.811 11.63 58 176 176
40 75.98 45.681 11.95 70 176 180
43 81.19 49.091 10.85 64 162 170
44 81.42 39.442 13.08 63 174 176
38 81.87 60.055 8.63 48 170 186
44 73.03 50.541 10.13 45 168 168
45 87.66 37.388 14.03 56 186 192
45 66.45 44.754 11.12 51 176 176
47 79.15 47.273 10.60 47 162 164
54 83.12 51.855 10.33 50 166 170
49 81.42 49.156 8.95 44 180 185
51 69.63 40.836 10.95 57 168 172
51 77.91 46.672 10.00 48 162 168
48 91.63 46.774 10.25 48 162 164
49 73.37 50.388 10.08 67 168 168
57 73.37 39.407 12.63 58 174 176
54 79.38 46.080 11.17 62 156 165
52 76.32 45.441 9.63 48 164 166
50 70.87 54.625 8.92 48 146 155
;
run;
data new;
input age weight oxygen runtime restpulse runpulse maxpulse;
datalines;
51 67.25 45.118 11.08 48 172 172
54 91.63 39.203 12.88 44 168 172
51 73.71 45.790 10.47 59 186 188
57 59.08 50.545 9.93 49 148 155
49 76.32 48.673 9.40 56 186 188
48 61.24 47.920 11.50 52 170 176
52 82.78 47.467 10.50 53 170 172
;
run;
proc glm data=fitness;
model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse;
store glmres;
quit;
proc plm restore=glmres;
score data=new out=newout predicted=predoxy lcl=lcl ucl=ucl lclm=lclm uclm=uclm;
quit;
Are there confidence intervals shown from your PROC REG? Or are they missing in PROC REG as well?
@PaigeMiller wrote:Are there confidence intervals shown from your PROC REG? Or are they missing in PROC REG as well?
The Proc REG did output confidence intervals for linear regression coefficient parameters.
I'm going out on limb with some observations.
First you have a model independent variable named Gender_code. Typically "gender" in biology is 2 categories. But regardless it is almost certainly not a continuous variable. Proc Reg is the basic regression proc and expects the result of an OLS equation like y =mx+b. to make sense numerically. If "x" is categorical and only takes two value values then the "m" doesn't likely make much sense. SAS provides a number of regression procedures that allow use of Class variables that have categories instead of continuous values. The more "categories" are involved the less sense.
Your example data shows a suspicious number of 0/1 values, like perhaps almost all of those variables are categories.
Perhaps you really should be looking at Proc GLM or another procedure entirely.
The item store in PROC REG will only generate the predicted observations and not intervals in the PLM procedure. The item store from PROC REG only stores the parameter estimates so only a subset options are available. If you have a continuous response use the GLM procedure to estimate the model and then PLM procedure to obtain predicted observations and intervals for the new data set. For example,
/* SAS CODE FOLLOWS */
data fitness;
input age weight oxygen runtime restpulse runpulse maxpulse;
datalines;
44 89.47 44.609 11.37 62 178 182
40 75.07 45.313 10.07 62 185 185
44 85.84 54.297 8.65 45 156 168
42 68.15 59.571 8.17 40 166 172
38 89.02 49.874 9.22 55 178 180
47 77.45 44.811 11.63 58 176 176
40 75.98 45.681 11.95 70 176 180
43 81.19 49.091 10.85 64 162 170
44 81.42 39.442 13.08 63 174 176
38 81.87 60.055 8.63 48 170 186
44 73.03 50.541 10.13 45 168 168
45 87.66 37.388 14.03 56 186 192
45 66.45 44.754 11.12 51 176 176
47 79.15 47.273 10.60 47 162 164
54 83.12 51.855 10.33 50 166 170
49 81.42 49.156 8.95 44 180 185
51 69.63 40.836 10.95 57 168 172
51 77.91 46.672 10.00 48 162 168
48 91.63 46.774 10.25 48 162 164
49 73.37 50.388 10.08 67 168 168
57 73.37 39.407 12.63 58 174 176
54 79.38 46.080 11.17 62 156 165
52 76.32 45.441 9.63 48 164 166
50 70.87 54.625 8.92 48 146 155
;
run;
data new;
input age weight oxygen runtime restpulse runpulse maxpulse;
datalines;
51 67.25 45.118 11.08 48 172 172
54 91.63 39.203 12.88 44 168 172
51 73.71 45.790 10.47 59 186 188
57 59.08 50.545 9.93 49 148 155
49 76.32 48.673 9.40 56 186 188
48 61.24 47.920 11.50 52 170 176
52 82.78 47.467 10.50 53 170 172
;
run;
proc glm data=fitness;
model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse;
store glmres;
quit;
proc plm restore=glmres;
score data=new out=newout predicted=predoxy lcl=lcl ucl=ucl lclm=lclm uclm=uclm;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.