BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

I have a dataset with race variables such as the following:

 

White Black Hispanic Asian Other 

0           1           1          0        0

1           0           1          0        0

1           0           0          0        0

1           0           0          0        0

0           0           0          1        0

1           0           0          0        0

1           0           0          0        0

0           0           0          0        1

 

In the dataset, respondent defines he or her race by 0 or 1. Value 1 indicates true that the respondent is defined as the race while 0 is false. There are missing values too.

I'm hoping to create another variable Race to define each respondent by race or races from the original dataset such as:

 

Race

Black/Hispanic

White/Hispanic

White

White

Asian

White

White

Other

 

I have the following code to create Race, but the number of observation is greater than the original dataset. It maybe caused by the respondents identify themselves with multiple races. How do I expand the code to account for multiple races and to produce the desired outcome?

data race1(keep = white--other race);
 set race1;
  array loc {*} white--other;
    do _i = 1 to dim(loc);
      if loc[_i] = '1' and loc[_i] ne ' ' then race = vname(loc[_i]);
      output;
    end;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input White Black Hispanic Asian Other ;
cards;
0           1           1          0        0
1           0           1          0        0
1           0           0          0        0
1           0           0          0        0
0           0           0          1        0
1           0           0          0        0
1           0           0          0        0
0           0           0          0        1
;

data want;
set have;
length race $50;
array t white--other;
do _i_=1 to dim(t);
if t(_i_) then Race=catx('/',race,vname(t(_i_)));
end;
keep race;
run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20
data have;
input White Black Hispanic Asian Other ;
cards;
0           1           1          0        0
1           0           1          0        0
1           0           0          0        0
1           0           0          0        0
0           0           0          1        0
1           0           0          0        0
1           0           0          0        0
0           0           0          0        1
;

data want;
set have;
length race $50;
array t white--other;
do _i_=1 to dim(t);
if t(_i_) then Race=catx('/',race,vname(t(_i_)));
end;
keep race;
run;
ballardw
Super User

To companion @novinosrin's approach, which is the same I would have chosen you may want a format to display more meaningful text.

And example:

proc format library=work;
value $race
'10000' ='White'
'01000' ='Black'
'00100' ='Hispanic'
'00010' ='Asian'
'10100' ='White/Hispanic'
'01100' ='Black/Hispanic'
other ='Other'
;
run;

I do not know which text meanings may be of critical need but the format approach lets create different text for reports without having to recode any variables. The above format goes with the order of the variables a novinosrin used.

 

The other at the end of the format is "anything not previously assigned is displayed with this text".

I have similar data and have text values of "White Only" "Black Only" "Asian Only" to be clear in reports that combinations are elsewhere. I have some reports that have a hierarchy such that if Hispanic occurs in combination with anything else it is reported as "Hispanic". I that case there are a few more code combinations on the Hispanic line. Another requirement is if two races (but not Hispanic) occur for a person then the category is "Two or more races". So that would have a different line in the format for all the 11000 10010 10001 11010 11001 etc. (which is easier with an other line, code all the others first).

At least you aren't having to deal with Native Hawaiian/ Pacific Islander and Alaska Native/American Indian codes as well (I do).

lydiawawa
Lapis Lazuli | Level 10
Hi @ballardw I do have to deal with Native Hawaiian/ Pacific Islander and Alaska Native/American Indian. The dataset was just a simplified version. The way you coded the format for race does it mean I have to concatenate responses into the form of '10000' and '01000' before setting the format?
ballardw
Super User

@lydiawawa wrote:
Hi @ballardw I do have to deal with Native Hawaiian/ Pacific Islander and Alaska Native/American Indian. The dataset was just a simplified version. The way you coded the format for race does it mean I have to concatenate responses into the form of '10000' and '01000' before setting the format?

Yes.

data have;
input White Black Hispanic Asian Other ;
cards;
0           1           1          0        0
1           0           1          0        0
1           0           0          0        0
1           0           0          0        0
0           0           0          1        0
1           0           0          0        0
1           0           0          0        0
0           0           0          0        1
;

data want;
set have;
length race $50;
array t white--other;
race= catt(of t(*));
run;
proc format library=work;
value $race
'10000' ='White'
'01000' ='Black'
'00100' ='Hispanic'
'00010' ='Asian'
'10100' ='White/Hispanic'
'01100' ='Black/Hispanic'
other ='Other'
;
run;

proc print data=want;
   var race;
   format race $race.;
run;
Astounding
PROC Star

I would find it easier to read a set of initials:

 

data want;

set have;

race='-----';

if white then substr(race, 1, 1) = 'W';

if black then substr(race, 2, 1) = 'B';

if Hispanic then substr(race, 3, 1) = 'H';

if Asian then substr(race, 4, 1) = 'A';

if other then substr(race, 5, 1) = 'O';

run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2948 views
  • 1 like
  • 4 in conversation