I'm not new to SAS, but new to using arrays. I have a SAS table set up like this except for the final_race variable, which is what i want:
patient_id | race_white | race_black | race_asian | race_am_indian | race_pac_islander | race_other | race_unk | final_race |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | white |
2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | multiracial |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown |
4 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | asian |
A 1 means a person is of that race. Some people could have 1s in multiple columns because they're multiracial. Some have all 0s meaning we don't know their race. I need to sort through all the combos.. if a person has a 1 for only one race, then I need final_race to be that race. Else, if there are multiple 1s, then final_race "Multiracial". If all 0s, then "Unknown". I tried to build something off https://communities.sas.com/t5/SAS-Procedures/Race-looping/td-p/182117
but I couldn't figure it out with the differences I need.
Thank you
Seems simple as long as the variable names actually match the values you want calculate.
data have;
input patient_id
race_white race_black race_asian race_am_indian race_pac_islander race_other race_unk
want :$20.
;
cards;
1 1 0 0 0 0 0 0 white
2 1 1 0 0 0 0 0 multiracial
3 0 0 0 0 0 0 0 unknown
4 0 0 1 0 0 0 0 asian
;
data want;
set have ;
length final_race $20;
array race race_: ;
select (sum(of race[*]));
when (.,0) final_race='unknown';
when (1) final_race=substr(vname(race[ whichn(1,of race[*]) ]),6);
otherwise final_race='multiracial';
end;
run;
/* UNTESTED CODE */
data want;
set have;
length final_race $ 12;
sum_races=sum(of race_whte--race_other);
if sum_races>1 then final_race='multiracial';
else if sum_races=0 then final_race='unknown';
else do;
if race_white=1 then race='white';
/* You type the rest, I'm lazy */
end;
drop sum_races;
run;
Alternative (UNTESTED)
data want;
set have;
length final_race $ 12;
array race race_white--race_other;
sum_races=sum(of race_white--race_other);
if sum_races>1 then final_race='multiracial';
else if sum_races=0 then final_race='unknown';
else do i=1 to dim(race);
if race(i)=1 then race=scan(vname(race(i)),2,'_');
end;
drop sum_races;
run;
Seems simple as long as the variable names actually match the values you want calculate.
data have;
input patient_id
race_white race_black race_asian race_am_indian race_pac_islander race_other race_unk
want :$20.
;
cards;
1 1 0 0 0 0 0 0 white
2 1 1 0 0 0 0 0 multiracial
3 0 0 0 0 0 0 0 unknown
4 0 0 1 0 0 0 0 asian
;
data want;
set have ;
length final_race $20;
array race race_: ;
select (sum(of race[*]));
when (.,0) final_race='unknown';
when (1) final_race=substr(vname(race[ whichn(1,of race[*]) ]),6);
otherwise final_race='multiracial';
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.