For each obs, I have 5 variables related to race (raceA, raceB, raceP, raceI and raceW). If 4 of the variables = 'N' and one of them = 'Y' then coding the race is easy. subgrp_race will be defined by the field that has the 'Y.' However, if two or more fields = 'Y' then I will define a new variable (RaceO) to denote that two or more races were chosen by the student.
Here's a proc freq below that shows the table raceA*raceB*raceP*raceI*raceW/ list;
I know I can write a buttload of if then statements but there has to be something more elegant.
raceA | raceB | raceP | raceI | raceW | ||||
N | N | N | N | Y | 3992 | 86.31 | 3992 | 86.31 |
---|---|---|---|---|---|---|---|---|
N | N | N | Y | N | 4 | 0.09 | 3996 | 86.40 |
N | N | N | Y | Y | 21 | 0.45 | 4017 | 86.85 |
N | N | Y | N | N | 1 | 0.02 | 4018 | 86.88 |
N | N | Y | N | Y | 4 | 0.09 | 4022 | 86.96 |
N | Y | N | N | N | 385 | 8.32 | 4407 | 95.29 |
N | Y | N | N | Y | 157 | 3.39 | 4564 | 98.68 |
N | Y | N | Y | Y | 3 | 0.06 | 4567 | 98.75 |
Y | N | N | N | N | 36 | 0.78 | 4603 | 99.52 |
Y | N | N | N | Y | 16 | 0.35 | 4619 | 99.87 |
Y | N | Y | N | N | 3 | 0.06 | 4622 | 99.94 |
Y | Y | N | N | N | 1 | 0.02 | 4623 | 99.96 |
Y | Y | N | N | Y | 2 | 0.04 | 4625 | 100.00 |
Try the code below, I believe it will give you what you want if I'm understanding your description correctly:
data have;
input raceA$ raceB$ raceP$ raceI$ raceW$;
datalines;
N N N N Y
N N N Y N
N N N Y Y
N N Y N N
N N Y N Y
N Y N N N
N Y N N Y
N Y N Y Y
Y N N N N
Y N N N Y
Y N Y N N
Y Y N N N
Y Y N N Y
;
run;
proc contents data=have noprint
out=have_contents (keep=NAME);
run;
%macro race;
data _NULL_;
set have_contents end=lastobs;
call symputx(cats('race',_n_),NAME);
if lastobs then call symputx('n',_n_);
run;
data want;
set have;
combine=catx(',',raceA,raceB,raceP,raceI,raceW);
If countc(combine,"Y")>1 then do;
raceO="Y";
subgrp_race="raceO"; end;
Else do;
%do i=1 %to &n;
If &&race&i="Y" then do;
raceO="N";
subgrp_race="&&race&i"; end;
%end; end;
drop combine;
run;
%mend;
%race
Hope this helps!
Try the code below, I believe it will give you what you want if I'm understanding your description correctly:
data have;
input raceA$ raceB$ raceP$ raceI$ raceW$;
datalines;
N N N N Y
N N N Y N
N N N Y Y
N N Y N N
N N Y N Y
N Y N N N
N Y N N Y
N Y N Y Y
Y N N N N
Y N N N Y
Y N Y N N
Y Y N N N
Y Y N N Y
;
run;
proc contents data=have noprint
out=have_contents (keep=NAME);
run;
%macro race;
data _NULL_;
set have_contents end=lastobs;
call symputx(cats('race',_n_),NAME);
if lastobs then call symputx('n',_n_);
run;
data want;
set have;
combine=catx(',',raceA,raceB,raceP,raceI,raceW);
If countc(combine,"Y")>1 then do;
raceO="Y";
subgrp_race="raceO"; end;
Else do;
%do i=1 %to &n;
If &&race&i="Y" then do;
raceO="N";
subgrp_race="&&race&i"; end;
%end; end;
drop combine;
run;
%mend;
%race
Hope this helps!
Thanks!
you can simply:
data want;
set have;
o=count(cats(of race:),'Y')>1;
run;
O =1 when you have more than 'Y' selected, otherwise O=0.
Yet another way:
I read in Yes/no variables as 1 and 0 with custom informats. The Race0 = (Sum(raceA,raceB,raceP,raceI, raceW)>1);
(I also tend to name such thins Race1 through 5 so Race1-Race5 makes better lists).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.