I need to perform regression on binary variables. I have converted the yes and no to 1 and 0 in the code below:
Data MAT_Fixed;
Set MAT;
If school='GP' then school=1;
else if school='MS' then school=0;
if sex='M' then sex=1;
else if sex='F' then sex=0;
if address='U' then address=1;
else if address='R' then address=0;
if famsize='LE3' then famsize=1;
else if famsize='GT3' then famsize=0;
if Pstatus='T' then Pstatus=1;
else if Pstatus='A' then Pstatus=0;
if schoolsup='yes' then schoolsup=1;
else if schoolsup='no' then schoolsup=0;
if famsup='yes' then famsup=1;
else if famsup='no' then famsup=0;
if paid='yes' then paid=1;
else if paid='no' then paid=0;
if activities='yes' then activities=1;
else if activities='no' then activities=0;
if nursery='yes' then nursery=1;
else if nursery='no' then nursery=0;
if higher='yes' then higher=1;
else if higher='no' then higher=0;
if internet='yes' then internet=1;
else if internet='no' then internet=0;
if romantic='yes' then romantic=1;
else if romantic='no' then romantic=0;
run;
Now I need to perform regression with the dependent variable G3 which is a numeric variable. I have attached the data sheet to help show the situation. I have tried proc reg but I get the error:
ERROR: Variable school in list does not match type prescribed for this list.
ERROR: Variable address in list does not match type prescribed for this list.
ERROR: Variable famsize in list does not match type prescribed for this list.
ERROR: Variable Pstatus in list does not match type prescribed for this list.
ERROR: Variable schoolsup in list does not match type prescribed for this list.
ERROR: Variable famsup in list does not match type prescribed for this list.
ERROR: Variable paid in list does not match type prescribed for this list.
ERROR: Variable activities in list does not match type prescribed for this list.
ERROR: Variable nursery in list does not match type prescribed for this list.
ERROR: Variable higher in list does not match type prescribed for this list.
ERROR: Variable internet in list does not match type prescribed for this list.
ERROR: Variable romantic in list does not match type prescribed for this list.
I think I need to use proc glm but I need the p-values for 0 and 1 for all the variables. Any ideas?
Two comments:
1. You cannot change the type of a variable. When you say
If school='GP' then school=1;
the data step will convert the number 1 to the character '1' (and there should be a NOTE in the log). Because PROC REG does not accept character variables, you get the errors that you report.
2. You do not need to create dummy variables yourself. In almost every SAS regression procedure (except PROC REG), you can use the CLASS statement to generate dummy variables automatically.
Thus you can run PROC GLM:
proc glm data=MAT;
class school sex famsize; /* ETC */
model G3 = school sex famsize /* ETC */;
run;
So, you don't say how you are doing this regression, and I hate having to make assumptions ... but are you using PROC REG and getting these errors? If so, you can't have character variables in PROC REG, and this code and similar does not work anyway
If school='GP' then school=1;
else if school='MS' then school=0;
You should get an error in the LOG when you run this, because you can't change SCHOOL from character to numeric in a data step. And then in PROC REG, SCHOOL is still character and so can't be used in PROC REG. You could use
If school='GP' then school1=1;
else if school='MS' then school1=0;
and then SCHOOL1 is numeric and can be used in PROC REG.
So, lesson here is ... please look at your LOG and address the errors there.
I think I need to use proc glm but I need the p-values for 0 and 1 for all the variables.
GLM does provide p-values, so really you don't need to create numeric variables as you are trying to do above, and you can run your character variables through PROC GLM.
Two comments:
1. You cannot change the type of a variable. When you say
If school='GP' then school=1;
the data step will convert the number 1 to the character '1' (and there should be a NOTE in the log). Because PROC REG does not accept character variables, you get the errors that you report.
2. You do not need to create dummy variables yourself. In almost every SAS regression procedure (except PROC REG), you can use the CLASS statement to generate dummy variables automatically.
Thus you can run PROC GLM:
proc glm data=MAT;
class school sex famsize; /* ETC */
model G3 = school sex famsize /* ETC */;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.