Convert character variable into numeric binary variables for each character value

Reply
Contributor
Posts: 20

Convert character variable into numeric binary variables for each character value

Does anyone know of a quick way to convert several character variables into binary variables where the binary variables represents a yes/no for each character value?

For example, consider a dataset of student names and the letter grade they received on each quiz (with no numeric score information):

StudentQuiz1Quiz2Quiz3
JackACD
JillBAA
JohnDFC
JaneBBF

I would like to convert as follows

StudentQuiz1Quiz2Quiz3Quiz1_AQuiz1_BQuiz1_CQuiz1_DQuiz1_F...
JackACD10000...
JillBAA01000...
JohnDFC00010...
JaneBBF01000...

I realize that I could use a whole bunch of IF-THEN statements; but it would be great if there was a more efficient way, especially since I need to do it for 70+ variables that have up to 20 character values apiece.

Thanks!

-Lee

Occasional Contributor
Posts: 7

Re: Convert character variable into numeric binary variables for each character value

Posted in reply to leeklammer

Try using arrays.

Set up a character array and a corresponding numerical array.

Hope this helps.

Brandy

Respected Advisor
Posts: 2,655

Re: Convert character variable into numeric binary variables for each character value

PROC GLMMOD will do this.  You'll need a dummy response variable (which you can use later to merge the results back against the original dataset).

data have;

set have;

resp=_n_;

run;

proc glmmod data=have outparm=parm;

class quiz1 quiz2 quiz3;

model resp=quiz1 quiz2 quiz3/noint;

run;

proc sql

create table want as select a.student, b.* from have a, parm b

where a.resp=b.resp;

quit;

This will generate only columns for grades that were actually scored on the quizzes.  If there is a grade that no student achieved, it can be added in a data step.

Steve Denham

Message was edited by: Steve Denham

Respected Advisor
Posts: 3,799

Re: Convert character variable into numeric binary variables for each character value

Posted in reply to SteveDenham

I wonder since the OP wants the variables named using source-variable-name+value that the features of PROC TRANSREG DESIGN might be a more direct path.

I wonder about 70 variables that will be lots of dummies.

You could add some dummy students to insure that all grade levels are represented.

Respected Advisor
Posts: 2,655

Re: Convert character variable into numeric binary variables for each character value

Posted in reply to data_null__

I agree, but if the OP only needs labels attached to the variables named col1-coln, this will work.

With TRANSREG, need to drop Intercept,  _TYPE_ and _NAME_.  The following code will work:

proc transreg data=have design;

model class(quiz1 quiz2 quiz3);

id student;

output out=coded;

run;

About the 70 variables, if that is for 70 quizzes, it won't be so hard to code given they all have the same name, but if they all have different names, it might be easier to put them into a macro string variable.

Steve Denham

SAS Super FREQ
Posts: 3,754

Re: Convert character variable into numeric binary variables for each character value

Posted in reply to SteveDenham

You said GLMMOD, but you wrote GLIMMIX.

Respected Advisor
Posts: 2,655

Re: Convert character variable into numeric binary variables for each character value

Thanks, Rick.  Edited now.  Sometimes fingers type what the muscle memory is, rather than what you really want...

Steve Denham

Super User
Posts: 19,832

Re: Convert character variable into numeric binary variables for each character value

Posted in reply to leeklammer

You have a bunch of solutions as to how, I'm curious as to why?

I'd bet (not my salary Smiley Wink) that there's a better way to do whatever you're trying to do with this.

Super User
Posts: 11,343

Re: Convert character variable into numeric binary variables for each character value

I used to deal with a survey data collection package that defaulted to the output of binary output such as this for each level of a categorical variable. We were forever doing the reverse of this process to get back to the A,B,C .. values. So I also wonder what the next steps are unless this is just an exercise in data manipulation.

Ask a Question
Discussion stats
  • 8 replies
  • 3657 views
  • 0 likes
  • 7 in conversation