Solved: Coding Race Variables

GreggB · Posted 07-20-2015 02:02 PM

For each obs, I have 5 variables related to race (raceA, raceB, raceP, raceI and raceW). If 4 of the variables = 'N' and one of them = 'Y' then coding the race is easy. subgrp_race will be defined by the field that has the 'Y.' However, if two or more fields = 'Y' then I will define a new variable (RaceO) to denote that two or more races were chosen by the student.

Here's a proc freq below that shows the table raceA*raceB*raceP*raceI*raceW/ list;

I know I can write a buttload of if then statements but there has to be something more elegant.

raceA	raceB	raceP	raceI	raceW
N	N	N	N	Y	3992	86.31	3992	86.31
N	N	N	Y	N	4	0.09	3996	86.40
N	N	N	Y	Y	21	0.45	4017	86.85
N	N	Y	N	N	1	0.02	4018	86.88
N	N	Y	N	Y	4	0.09	4022	86.96
N	Y	N	N	N	385	8.32	4407	95.29
N	Y	N	N	Y	157	3.39	4564	98.68
N	Y	N	Y	Y	3	0.06	4567	98.75
Y	N	N	N	N	36	0.78	4603	99.52
Y	N	N	N	Y	16	0.35	4619	99.87
Y	N	Y	N	N	3	0.06	4622	99.94
Y	Y	N	N	N	1	0.02	4623	99.96
Y	Y	N	N	Y	2	0.04	4625	100.00

dcruik · Posted 07-20-2015 02:44 PM

Try the code below, I believe it will give you what you want if I'm understanding your description correctly:

data have;

input raceA$ raceB$ raceP$ raceI$ raceW$;

datalines;

N N N N Y

N N N Y N

N N N Y Y

N N Y N N

N N Y N Y

N Y N N N

N Y N N Y

N Y N Y Y

Y N N N N

Y N N N Y

Y N Y N N

Y Y N N N

Y Y N N Y

;

run;

proc contents data=have noprint

out=have_contents (keep=NAME);

run;

%macro race;

data _NULL_;

set have_contents end=lastobs;

call symputx(cats('race',_n_),NAME);

if lastobs then call symputx('n',_n_);

run;

data want;

set have;

combine=catx(',',raceA,raceB,raceP,raceI,raceW);

If countc(combine,"Y")>1 then do;

raceO="Y";

subgrp_race="raceO"; end;

Else do;

%do i=1 %to &n;

If &&race&i="Y" then do;

raceO="N";

subgrp_race="&&race&i"; end;

%end; end;

drop combine;

run;

%mend;

%race

Hope this helps!

View solution in original post

dcruik · Posted 07-20-2015 02:44 PM

Try the code below, I believe it will give you what you want if I'm understanding your description correctly:

data have;

input raceA$ raceB$ raceP$ raceI$ raceW$;

datalines;

N N N N Y

N N N Y N

N N N Y Y

N N Y N N

N N Y N Y

N Y N N N

N Y N N Y

N Y N Y Y

Y N N N N

Y N N N Y

Y N Y N N

Y Y N N N

Y Y N N Y

;

run;

proc contents data=have noprint

out=have_contents (keep=NAME);

run;

%macro race;

data _NULL_;

set have_contents end=lastobs;

call symputx(cats('race',_n_),NAME);

if lastobs then call symputx('n',_n_);

run;

data want;

set have;

combine=catx(',',raceA,raceB,raceP,raceI,raceW);

If countc(combine,"Y")>1 then do;

raceO="Y";

subgrp_race="raceO"; end;

Else do;

%do i=1 %to &n;

If &&race&i="Y" then do;

raceO="N";

subgrp_race="&&race&i"; end;

%end; end;

drop combine;

run;

%mend;

%race

Hope this helps!

GreggB · Posted 07-20-2015 02:49 PM

Thanks!

Haikuo · Posted 07-20-2015 02:51 PM

you can simply:

data want;

set have;

o=count(cats(of race:),'Y')>1;

run;

O =1 when you have more than 'Y' selected, otherwise O=0.

ballardw · Posted 07-20-2015 04:10 PM

Yet another way:

I read in Yes/no variables as 1 and 0 with custom informats. The Race0 = (Sum(raceA,raceB,raceP,raceI, raceW)>1);

(I also tend to name such thins Race1 through 5 so Race1-Race5 makes better lists).

Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

Re: Coding Race Variables

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away