Hi there,
I want to create a dummy variable for gender (m=0, f=1) so that I can use proc reg for a bivariate analyses. However, my error keeps indicating that my variable doesn't exist. Any insight would be useful 🙂
data eva.cohort;
set eva.finalcohort;
if sex="m" then sexm=0;
if sex="f" then sexf=1;
avgvol1=sum(avg_2009+avg_2011+avg_2012+avg_2014)/4;
avgvol2=sum(bee_avg_2009+bee_avg_2011+bee_avg_2012+bee_avg_2014)/4;
run;
proc means data=eva.cohort n nmiss mean median max min;
var avgvol1 avgvol2;
run;
proc reg data=eva.cohort;
model avgvol1=sexm sexf;
run;
Perhaps your actual data doesn't contain "m" and "f". Perhaps it contains "M" and "F" instead.
At any rate, you would be well advised to treat SEX as a CLASS variable within PROC REG. Most regression procedures will automatically create the proper dummy variables when you use a CLASS statement.
Note for the future: instead of posting the program, post the log so we can see what message applies to what step.
Perhaps your actual data doesn't contain "m" and "f". Perhaps it contains "M" and "F" instead.
At any rate, you would be well advised to treat SEX as a CLASS variable within PROC REG. Most regression procedures will automatically create the proper dummy variables when you use a CLASS statement.
Note for the future: instead of posting the program, post the log so we can see what message applies to what step.
@Astounding wrote:
At any rate, you would be well advised to treat SEX as a CLASS variable within PROC REG. Most regression procedures will automatically create the proper dummy variables when you use a CLASS statement.
PROC GLM, not PROC REG
To @kthartma: there is no need to create your own dummy variables. PROC GLM will create them for you, and also avoid the programming error you are having.
Thanks so much! I ended up using proc glm.
You're not using the SUM() function as usually intended either.
It's usually used when you want to consider missing as 0, this approach wouldn't do that because you've listed the items with + in between rather than comma's.
Test your code with the following:
avgvol1=sum(avg_2009+avg_2011+avg_2012+avg_2014)/4;
avgvol1_check0 =sum(avg_2009, avg_2011, avg_2012, avg_2014)/4;
avgvol1_check1 = sum(of avg_2009-avg_2014)/ 4;
avgvol1_check2 = mean(of avg_2009-avg_2014);
If you have different results between any of the calculations you have an issue.
If you insist on using proc reg your code should read:
data eva.cohort;
set eva.finalcohort;
if sex="m" then sexDum=0;
if sex="f" then sexDum=1;
avgvol1 = (avg_2009 + avg_2011 + avg_2012 + avg_2014) / 4;
avgvol2 = (bee_avg_2009 + bee_avg_2011 + bee_avg_2012 + bee_avg_2014) / 4;
run;
proc means data=eva.cohort n nmiss mean median max min;
var avgvol1 avgvol2;
run;
proc reg data=eva.cohort;
model avgvol1 = sexDum;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.