Hi everyone,
I am using proc reg for the analysis of my study data.
Dependent variable= mcs score
independent variable= cat2 cat3 age income
where cat2 and cat3 are both categorical variables. My reference group is category 1..I have created dummy variables.
My code is as follows:
ods graphics on;
proc reg data=dummyfinal plots(maxpoints=none);
model mcs42=cat2 cat3;
output out=new P=YHAT RSTUDENT=RESID L95M=LOW U95M=HIGH;
run;
quit;
ods graphics off;
after getting the predicted value (YHAT) of the dependent variable, I have to obtain the mean mcs scores across the 3 categories (cat1 cat2 cat3) along with the confidence intervals and do multiple comparison tests( eg: Tukey kramer).
can anyone please help me with the SAS codes that I should run to obtain the following results.
My results should look like this:
means and SE | ||||
MCS | ||||
Mean (SE) | p value | |||
cat1( reference group) | 40.45 (0.94) ∗∗∗ | <0.001 | ||
cat2 | 43.76 (0.71) ∗∗∗ | <0.001 | ||
cat3 | 46.96 (0.78) ∗∗∗ | <0.001 | ||
@uzma03505621 wrote:
Hi everyone,
I am using proc reg for the analysis of my study data.
Dependent variable= mcs score
independent variable= cat2 cat3 age income
where cat2 and cat3 are both categorical variables. My reference group is category 1..I have created dummy variables.
My code is as follows:
ods graphics on;
proc reg data=dummyfinal plots(maxpoints=none);
model mcs42=cat2 cat3;
output out=new P=YHAT RSTUDENT=RESID L95M=LOW U95M=HIGH;
run;
quit;
ods graphics off;
after getting the predicted value (YHAT) of the dependent variable, I have to obtain the mean mcs scores across the 3 categories (cat1 cat2 cat3) along with the confidence intervals and do multiple comparison tests( eg: Tukey kramer).
can anyone please help me with the SAS codes that I should run to obtain the following results.
My results should look like this:
means and SE MCS Mean (SE) p value cat1( reference group) 40.45 (0.94) ∗∗∗ <0.001 cat2 43.76 (0.71) ∗∗∗ <0.001 cat3 46.96 (0.78) ∗∗∗ <0.001
There are some things that really aren't clear, such as you say age and time are independent variables, but these are not in your model. In addition, you talk about a reference category of category1, even though you haven't put category1 into the model, and I assume these are three levels of a single variable.
So anyway, here is how to handle a categorical variable with 3 levels, which I have named CAT.
To get means in this case, you can use PROC GLM, and you don't have to create the dummy variables yourself.
proc glm data=dummyfinal;
class cat(ref='1');
model mcs42=cat;
means cat/t;
quit;
@uzma03505621 wrote:
I'm sorry for the typing error, the correct code is :
ods graphics on;
proc reg data=dummyfinal plots(maxpoints=none);
model mcs42=cat2 cat3 age income;
output out=new P=YHAT RSTUDENT=RESID L95M=LOW U95M=HIGH;
run;
quit;
ods graphics off;
My original variable is called category with three values (1,2,3). I created dummy variables as follows:
If category=2 then cat2=1 else 0
If category=3 then cat3=1 else 0
So when I run the proc reg program, category=1 will be used as reference by-default? ( please correct me if I am wrong)
Thanks for proc glm code, I have used this before. I am being told to use proc reg only, that's why I created dummy variables, I need help with codes to compare the new regression adjusted mcs MEANS (with confidence interval) between these 3 categories.
I appreciate your time and consideration.
Thank you.
I don't know how to get the means that you are asking for using PROC REG only. As you can see, it's very easy to get the means from PROC GLM.
I used dummy coding:
data dummyfinal;
set finalfile;
if category=2 then cat2=1; else cat2=0;
if category=3 then cat3=1; else cat3=0;
run;
Then I did proc reg (unadjusted model without income and age) as follows:
/*unadjusted model*/
ods graphics on;
proc reg data=dummyfinal plots(maxpoints=none);
model mcs42=cat2 cat3;
output out=new P=YHAT RSTUDENT=RESID L95M=LOW U95M=HIGH;
run;
quit;
ods graphics off;
My results look this like
Parameter Estimates | ||||||
---|---|---|---|---|---|---|
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | 41.48793 | 0.26104 | 158.93 | <.0001 |
cat2 | 1 | -4.81016 | 0.43795 | -10.98 | <.0001 | |
cat3 | 1 | 11.55753 | 0.28816 | 40.11 | <.0001 |
Now Yhat is my predicted dependent variable, I want to compare the statistically significant difference in the means of my predicted dependent variable across my three category (independent) variable.
So should I do ANOVA+post hoc of this predicted Yhat? or is there any other method to get an output as below:
Unadjusted means and SE | ||||
PCS | MCS | |||
Mean (SE) | p value | Mean (SE) | p value | |
Category 1 (reference) | 37.27 (0.96) ∗∗∗ | <0.001 | 40.45 (0.94) ∗∗∗ | <0.001 |
Category2 | 37.02 (0.97) ∗∗∗ | <0.001 | 43.76 (0.71) ∗∗∗ | <0.001 |
category3 | 38.38 (1.04) ∗ | 0.016 | 46.96 (0.78) ∗∗∗ | <0.001 |
I think the real problem here is whoever told you that PROC REG has to be used, this is bad advice, when PROC GLM makes this simple.
Nevertheless, I still don't know how to do this with PROC REG, specifically I'm not sure how you get the CORRECT standard errors, and so I cannot advise further if PROC REG has to be used.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.