I have a long dataset of over a million records of people with their race category. Some people have more than one race assigned to their person id. What I want to do is select the best race for each person id. In the dataset some people will have an observation that says unknown or other and also actual race category, I want to get rid of the unknown/other observation and keep the more descriptive race category. Some people will also have different races listed. For example person A will have one observation that says black and one observation that says white. For those I want to be able to assign that person's race to be multiracial. I was trying to do this using a Do Until statement but it is not outputting what I want.
proc sort data=race_check3; ; by child_id; run;
Data race_check4;
Do until (last.child_id);
set race_check3;
by child_id;
if (race_eth NE 'Unknown') then race_eth2=race_eth;
end;
run;
Data have:
Child_id | Race_eth |
16849918029 | NH Asian/Native Hawaiian/Other Pacific Islands |
16849918029 | NH White |
16850191831 | NH White |
16850191831 | Unknown |
16850404097 | NH Asian/Native Hawaiian/Other Pacific Islands |
16850404097 | Unknown |
16850502704 | NH Asian/Native Hawaiian/Other Pacific Islands |
16850502704 | NH White |
16850783900 | Other |
16850783900 | Unknown |
16850922663 | NH Asian/Native Hawaiian/Other Pacific Islands |
16850922663 | Unknown |
16851203403 | NH Asian/Native Hawaiian/Other Pacific Islands |
16851203403 | NH White |
16851234942 | NH White |
16851234942 | Unknown |
Data Want
Child_id | Race_eth |
16849918029 | Multi Racial |
16850191831 | NH White |
16850404097 | NH Asian/Native Hawaiian/Other Pacific Islands |
16850502704 | Multi Racial |
16850783900 | Other |
16850922663 | NH Asian/Native Hawaiian/Other Pacific Islands |
16851203403 | Multi Racial |
16851234942 | NH White |
Something like this, perhaps:
Data race_check4;
Do until (last.child_id);
set race_check3;
by child_id;
length race_eth2 $100;
if (race_eth NE 'Unknown') then do;
if race_eth2=' ' then
race_eth2=race_eth;
else if race_eth2 ne race_eth then
race_eth2='Multi Racial';
end;
end;
if race_eth2=' ' then
race_eth2='Unknown';
drop race_eth;
rename race_eth2=race_eth;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.