11-07-2013 10:46 AM
Does anyone know of a quick way to convert several character variables into binary variables where the binary variables represents a yes/no for each character value?
For example, consider a dataset of student names and the letter grade they received on each quiz (with no numeric score information):
I would like to convert as follows
I realize that I could use a whole bunch of IF-THEN statements; but it would be great if there was a more efficient way, especially since I need to do it for 70+ variables that have up to 20 character values apiece.
11-08-2013 08:55 AM
PROC GLMMOD will do this. You'll need a dummy response variable (which you can use later to merge the results back against the original dataset).
proc glmmod data=have outparm=parm;
class quiz1 quiz2 quiz3;
model resp=quiz1 quiz2 quiz3/noint;
create table want as select a.student, b.* from have a, parm b
This will generate only columns for grades that were actually scored on the quizzes. If there is a grade that no student achieved, it can be added in a data step.
Message was edited by: Steve Denham
11-08-2013 09:04 AM
I wonder since the OP wants the variables named using source-variable-name+value that the features of PROC TRANSREG DESIGN might be a more direct path.
I wonder about 70 variables that will be lots of dummies.
You could add some dummy students to insure that all grade levels are represented.
11-08-2013 09:32 AM
I agree, but if the OP only needs labels attached to the variables named col1-coln, this will work.
With TRANSREG, need to drop Intercept, _TYPE_ and _NAME_. The following code will work:
proc transreg data=have design;
model class(quiz1 quiz2 quiz3);
About the 70 variables, if that is for 70 quizzes, it won't be so hard to code given they all have the same name, but if they all have different names, it might be easier to put them into a macro string variable.
11-08-2013 02:54 PM
Thanks, Rick. Edited now. Sometimes fingers type what the muscle memory is, rather than what you really want...
11-08-2013 03:27 PM
You have a bunch of solutions as to how, I'm curious as to why?
I'd bet (not my salary ) that there's a better way to do whatever you're trying to do with this.
11-08-2013 04:23 PM
I used to deal with a survey data collection package that defaulted to the output of binary output such as this for each level of a categorical variable. We were forever doing the reverse of this process to get back to the A,B,C .. values. So I also wonder what the next steps are unless this is just an exercise in data manipulation.