## Coding Race Variables

Solved
Super Contributor
Posts: 286

# Coding Race Variables

For each obs, I have 5 variables related to race (raceA, raceB, raceP, raceI and raceW).  If 4 of the variables = 'N' and one of them = 'Y' then coding the race is easy.  subgrp_race will be defined by the field that has the 'Y.'  However, if two or more fields = 'Y' then I will define a new variable (RaceO) to denote that two or more races were chosen by the student.

Here's a proc freq below that shows the table raceA*raceB*raceP*raceI*raceW/ list;

I know I can write a buttload of if then statements but there has to be something more elegant.

N N N N Y raceA raceB raceP raceI raceW 3992 86.31 3992 86.31 4 0.09 3996 86.40 21 0.45 4017 86.85 1 0.02 4018 86.88 4 0.09 4022 86.96 385 8.32 4407 95.29 157 3.39 4564 98.68 3 0.06 4567 98.75 36 0.78 4603 99.52 16 0.35 4619 99.87 3 0.06 4622 99.94 1 0.02 4623 99.96 2 0.04 4625 100.00

Accepted Solutions
Solution
‎07-20-2015 02:44 PM
Frequent Contributor
Posts: 132

## Re: Coding Race Variables

Try the code below, I believe it will give you what you want if I'm understanding your description correctly:

data have;

input raceA\$ raceB\$ raceP\$ raceI\$ raceW\$;

datalines;

N N N N Y

N N N Y N

N N N Y Y

N N Y N N

N N Y N Y

N Y N N N

N Y N N Y

N Y N Y Y

Y N N N N

Y N N N Y

Y N Y N N

Y Y N N N

Y Y N N Y

;

run;

proc contents data=have noprint

out=have_contents (keep=NAME);

run;

%macro race;

data _NULL_;

set have_contents end=lastobs;

call symputx(cats('race',_n_),NAME);

if lastobs then call symputx('n',_n_);

run;

data want;

set have;

combine=catx(',',raceA,raceB,raceP,raceI,raceW);

If countc(combine,"Y")>1 then do;

raceO="Y";

subgrp_race="raceO"; end;

Else do;

%do i=1 %to &n;

If &&race&i="Y" then do;

raceO="N";

subgrp_race="&&race&i"; end;

%end; end;

drop combine;

run;

%mend;

%race

Hope this helps!

All Replies
Solution
‎07-20-2015 02:44 PM
Frequent Contributor
Posts: 132

## Re: Coding Race Variables

Try the code below, I believe it will give you what you want if I'm understanding your description correctly:

data have;

input raceA\$ raceB\$ raceP\$ raceI\$ raceW\$;

datalines;

N N N N Y

N N N Y N

N N N Y Y

N N Y N N

N N Y N Y

N Y N N N

N Y N N Y

N Y N Y Y

Y N N N N

Y N N N Y

Y N Y N N

Y Y N N N

Y Y N N Y

;

run;

proc contents data=have noprint

out=have_contents (keep=NAME);

run;

%macro race;

data _NULL_;

set have_contents end=lastobs;

call symputx(cats('race',_n_),NAME);

if lastobs then call symputx('n',_n_);

run;

data want;

set have;

combine=catx(',',raceA,raceB,raceP,raceI,raceW);

If countc(combine,"Y")>1 then do;

raceO="Y";

subgrp_race="raceO"; end;

Else do;

%do i=1 %to &n;

If &&race&i="Y" then do;

raceO="N";

subgrp_race="&&race&i"; end;

%end; end;

drop combine;

run;

%mend;

%race

Hope this helps!

Super Contributor
Posts: 286

Thanks!

Posts: 3,167

## Re: Coding Race Variables

you can simply:

data want;

set have;

o=count(cats(of race,'Y')>1;

run;

O =1 when you have more than 'Y' selected, otherwise O=0.

Super User
Posts: 13,542

## Re: Coding Race Variables

Yet another way:

I read in Yes/no variables as 1 and 0 with custom informats. The Race0 = (Sum(raceA,raceB,raceP,raceI, raceW)>1);

(I also tend to name such thins Race1 through 5 so Race1-Race5 makes better lists).

🔒 This topic is solved and locked.