Does anyone know of a quick way to convert several character variables into binary variables where the binary variables represents a yes/no for each character value?
For example, consider a dataset of student names and the letter grade they received on each quiz (with no numeric score information):
Student | Quiz1 | Quiz2 | Quiz3 |
---|---|---|---|
Jack | A | C | D |
Jill | B | A | A |
John | D | F | C |
Jane | B | B | F |
I would like to convert as follows
Student | Quiz1 | Quiz2 | Quiz3 | Quiz1_A | Quiz1_B | Quiz1_C | Quiz1_D | Quiz1_F | ... |
---|---|---|---|---|---|---|---|---|---|
Jack | A | C | D | 1 | 0 | 0 | 0 | 0 | ... |
Jill | B | A | A | 0 | 1 | 0 | 0 | 0 | ... |
John | D | F | C | 0 | 0 | 0 | 1 | 0 | ... |
Jane | B | B | F | 0 | 1 | 0 | 0 | 0 | ... |
I realize that I could use a whole bunch of IF-THEN statements; but it would be great if there was a more efficient way, especially since I need to do it for 70+ variables that have up to 20 character values apiece.
Thanks!
-Lee
Try using arrays.
Set up a character array and a corresponding numerical array.
Hope this helps.
Brandy
PROC GLMMOD will do this. You'll need a dummy response variable (which you can use later to merge the results back against the original dataset).
data have;
set have;
resp=_n_;
run;
proc glmmod data=have outparm=parm;
class quiz1 quiz2 quiz3;
model resp=quiz1 quiz2 quiz3/noint;
run;
proc sql
create table want as select a.student, b.* from have a, parm b
where a.resp=b.resp;
quit;
This will generate only columns for grades that were actually scored on the quizzes. If there is a grade that no student achieved, it can be added in a data step.
Steve Denham
Message was edited by: Steve Denham
I wonder since the OP wants the variables named using source-variable-name+value that the features of PROC TRANSREG DESIGN might be a more direct path.
I wonder about 70 variables that will be lots of dummies.
You could add some dummy students to insure that all grade levels are represented.
I agree, but if the OP only needs labels attached to the variables named col1-coln, this will work.
With TRANSREG, need to drop Intercept, _TYPE_ and _NAME_. The following code will work:
proc transreg data=have design;
model class(quiz1 quiz2 quiz3);
id student;
output out=coded;
run;
About the 70 variables, if that is for 70 quizzes, it won't be so hard to code given they all have the same name, but if they all have different names, it might be easier to put them into a macro string variable.
Steve Denham
I have patients' medication data, need to convert it to binary variables. Do you have any solutions?
what I have:
Patient | Medication1 | Medication2 | Medication3 |
ID1 | A | B | C |
ID2 | C | E | F |
ID3 | E | G | H |
what I want:
Patient | A | B | C | D | E | F | G | H |
ID1 | 1 | 1 | 1 | |||||
ID2 | 1 | 1 | 1 | |||||
ID3 | 1 | 1 | 1 |
Thanks, Rick. Edited now. Sometimes fingers type what the muscle memory is, rather than what you really want...
Steve Denham
You have a bunch of solutions as to how, I'm curious as to why?
I'd bet (not my salary ) that there's a better way to do whatever you're trying to do with this.
I used to deal with a survey data collection package that defaulted to the output of binary output such as this for each level of a categorical variable. We were forever doing the reverse of this process to get back to the A,B,C .. values. So I also wonder what the next steps are unless this is just an exercise in data manipulation.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.