I am trying to create one variable for race. The survey software used to collect this information stored each seperate race as a different variable. I have this code below and it is not correctly identifying people as native american or even mixed. Any help would be great! I am sure there is some rule I looked over that is responisble for this.
if complete=1 then do;
/*white*/if racea=0 & raceb=0 & racec=0 & raced=0 & racee=1 then race_c=0;
/*black*/else if racea=0 & raceb=0 & racec=1 & raced=0 & racee=0 then race_c=1;
/*asian*/else if racea=0 & raceb=1 & racec=0 & raced=0 & racee=0 then race_c=2;
/*american indian*/else if racea=1 & raceb=0 & racec=0 & raced=0 & racee=0 then race_c=3;
/*native hawiian*/else if racea=0 & raceb=0 & racec=0 & raced=1 & racee=0 then race_c=4;
/*mixed*/ else if (racea+raceb+racec+raced+racee)>1 then race_c=5;
/*other*/ else if (racea+raceb+racec+raced+racee)=0 then race_c=6
;
end;
Your code looks workable. Take an example or two where you are getting the wrong answer and look at the values of the incoming variables. Surely it will become obvious at that point.
It would help to provide an example of the data where this isn't working.
This works as intended:
data junk;
input racea raceb racec raced racee;
/*white*/ if racea=0 & raceb=0 & racec=0 & raced=0 & racee=1 then race_c=0;
/*black*/ else if racea=0 & raceb=0 & racec=1 & raced=0 & racee=0 then race_c=1;
/*asian*/ else if racea=0 & raceb=1 & racec=0 & raced=0 & racee=0 then race_c=2;
/*american indian*/else if racea=1 & raceb=0 & racec=0 & raced=0 & racee=0 then race_c=3;
/*native hawiian*/ else if racea=0 & raceb=0 & racec=0 & raced=1 & racee=0 then race_c=4;
/*mixed*/ else if (racea+raceb+racec+raced+racee)>1 then race_c=5;
/*other*/ else if (racea+raceb+racec+raced+racee)=0 then race_c=6;
datalines;
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
1 0 0 0 0
1 0 1 0 0
0 0 0 0 0
;
run;
I would verify that you have values of 0 and not missing in the data. If any of RaceA through RaceE are missing then the result will be missing.
Your code looks workable. Take an example or two where you are getting the wrong answer and look at the values of the incoming variables. Surely it will become obvious at that point.
You were right, my format statement was off. Always the small things that get you!!
The problem is somewhere outside the code that you have submitted. If it has to do with missing values, you can write code that is more robust by using SAS functions such as SUM and WHICHN
proc format;
value raceCode
0 = "White"
1 = "Black"
2 = "Asian"
3 = "American Indian"
4 = "Native Hawiian"
5 = "Mixed"
6 = "Other"
OTHER = "Unknown";
run;
data junk;
input racea raceb racec raced racee;
select (sum(of racea -- racee, 0));
when (1) race_c = whichn(1, racee, racec, raceb, racea, raced) - 1;
when (0) race_c = 6;
otherwise race_c = 5;
end;
format race_c raceCode.;
datalines;
0 0 0 0 1
0 0 0 1 0
0 0 1 . 0
0 1 0 0 0
1 0 0 0 0
1 0 1 0 0
0 0 0 0 0
. . . . .
;
proc print noobs; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.