For each obs, I have 5 variables related to race (raceA, raceB, raceP, raceI and raceW). If 4 of the variables = 'N' and one of them = 'Y' then coding the race is easy. subgrp_race will be defined by the field that has the 'Y.' However, if two or more fields = 'Y' then I will define a new variable (RaceO) to denote that two or more races were chosen by the student.
Here's a proc freq below that shows the table raceA*raceB*raceP*raceI*raceW/ list;
I know I can write a buttload of if then statements but there has to be something more elegant.
raceA | raceB | raceP | raceI | raceW | ||||
N | N | N | N | Y | 3992 | 86.31 | 3992 | 86.31 |
---|---|---|---|---|---|---|---|---|
N | N | N | Y | N | 4 | 0.09 | 3996 | 86.40 |
N | N | N | Y | Y | 21 | 0.45 | 4017 | 86.85 |
N | N | Y | N | N | 1 | 0.02 | 4018 | 86.88 |
N | N | Y | N | Y | 4 | 0.09 | 4022 | 86.96 |
N | Y | N | N | N | 385 | 8.32 | 4407 | 95.29 |
N | Y | N | N | Y | 157 | 3.39 | 4564 | 98.68 |
N | Y | N | Y | Y | 3 | 0.06 | 4567 | 98.75 |
Y | N | N | N | N | 36 | 0.78 | 4603 | 99.52 |
Y | N | N | N | Y | 16 | 0.35 | 4619 | 99.87 |
Y | N | Y | N | N | 3 | 0.06 | 4622 | 99.94 |
Y | Y | N | N | N | 1 | 0.02 | 4623 | 99.96 |
Y | Y | N | N | Y | 2 | 0.04 | 4625 | 100.00 |
Try the code below, I believe it will give you what you want if I'm understanding your description correctly:
data have;
input raceA$ raceB$ raceP$ raceI$ raceW$;
datalines;
N N N N Y
N N N Y N
N N N Y Y
N N Y N N
N N Y N Y
N Y N N N
N Y N N Y
N Y N Y Y
Y N N N N
Y N N N Y
Y N Y N N
Y Y N N N
Y Y N N Y
;
run;
proc contents data=have noprint
out=have_contents (keep=NAME);
run;
%macro race;
data _NULL_;
set have_contents end=lastobs;
call symputx(cats('race',_n_),NAME);
if lastobs then call symputx('n',_n_);
run;
data want;
set have;
combine=catx(',',raceA,raceB,raceP,raceI,raceW);
If countc(combine,"Y")>1 then do;
raceO="Y";
subgrp_race="raceO"; end;
Else do;
%do i=1 %to &n;
If &&race&i="Y" then do;
raceO="N";
subgrp_race="&&race&i"; end;
%end; end;
drop combine;
run;
%mend;
%race
Hope this helps!
Try the code below, I believe it will give you what you want if I'm understanding your description correctly:
data have;
input raceA$ raceB$ raceP$ raceI$ raceW$;
datalines;
N N N N Y
N N N Y N
N N N Y Y
N N Y N N
N N Y N Y
N Y N N N
N Y N N Y
N Y N Y Y
Y N N N N
Y N N N Y
Y N Y N N
Y Y N N N
Y Y N N Y
;
run;
proc contents data=have noprint
out=have_contents (keep=NAME);
run;
%macro race;
data _NULL_;
set have_contents end=lastobs;
call symputx(cats('race',_n_),NAME);
if lastobs then call symputx('n',_n_);
run;
data want;
set have;
combine=catx(',',raceA,raceB,raceP,raceI,raceW);
If countc(combine,"Y")>1 then do;
raceO="Y";
subgrp_race="raceO"; end;
Else do;
%do i=1 %to &n;
If &&race&i="Y" then do;
raceO="N";
subgrp_race="&&race&i"; end;
%end; end;
drop combine;
run;
%mend;
%race
Hope this helps!
Thanks!
you can simply:
data want;
set have;
o=count(cats(of race:),'Y')>1;
run;
O =1 when you have more than 'Y' selected, otherwise O=0.
Yet another way:
I read in Yes/no variables as 1 and 0 with custom informats. The Race0 = (Sum(raceA,raceB,raceP,raceI, raceW)>1);
(I also tend to name such thins Race1 through 5 so Race1-Race5 makes better lists).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.