BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

Hi,

I have data with 5 variables for race. Each var has either "y" or missing as this sample:

RaceAABRaceAMINRaceASIANRaceNH_PIRaceWHITERace
    YWhite
Y    African American or Black
  Y  Asian
   Y Native Hawaiian
Y    African American or Black
 Y   Native American
    YWhite

I need to create a new variable "race" as in the 6th column above.

I tried this but its not giving me right resuts:

data want;
set have;

if Racewhite= "y" then race = "white";
else if RaceAAB = "y" then race ="African American or black";
else if RaceAmin = "y" then race ="Native American";
else if Raceasian = "y" then race ="asian";
else if RaceNH_PI = "y" then race ="Native Hawaiian";
run;

Please advise.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

If it is that simple, and those are the only race/demographic variables you have, a bunch of IF-THEN statements are preferable. My solution is a bit convoluted and could be considered overkill in this example. It also assumes that no one can be multiple races. If someone has multiple races, it will take the last observation carrying that value.

data have;
infile datalines delimiter = ',' dsd;
input RaceAAB $ RaceAMIN $ RaceASIAN $ RaceNH_PI $ RaceWHITE $;
datalines;
,,,,Y
Y,,,,
,,Y,,
,,,Y,
Y,,,,
,Y,,,
,,,,Y
;
run;

proc format;
	value $race_col
		'AAB' = 'African American or Black'
		'AMIN' = 'Native American'
		'ASIAN' = 'Asian'
		'NH_PI' = 'Native Hawaiian'
		'WHITE' = 'White'
		other = 'Not matched';
run;

data want (drop = i);
	set have;
	array _r [*] $ RaceAAB -- RaceWHITE;
	do i = 1 to dim(_r);
		if _r[i] = 'Y' then Race = put(strip(tranwrd(vname(_r[i]), 'Race', '')), $race_col.);
	end;
run;

maguiremq_0-1652366337530.png

As always, if something can be done simply, do it that way.

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

How is it not right?

 

Your example data is showing Y in the variables (uppercase), but your code is looking for y (lowercase).  So RACE will probably be missing for all observations.

 

You did not define RACE before using it.  So SAS will guess how you wanted it defined by how you first used it. With your posted code that means RACE will be define as LENGTH $5 since the string "white" needs only 5 bytes.  So the other values you attempt to store into RACE will be truncated to only 5 bytes.  Add a LENGTH statement before referencing RACE so you can define it to be long enough to hold the longest value.

data want;
  set have;
  length race $30 ;
  if Racewhite= "Y" then race = "white";
  else if RaceAAB = "Y" then race ="African American or black";
  else if RaceAmin = "Y" then race ="Native American";
  else if Raceasian = "Y" then race ="asian";
  else if RaceNH_PI = "Y" then race ="Native Hawaiian";
run;

 

maguiremq
SAS Super FREQ

If it is that simple, and those are the only race/demographic variables you have, a bunch of IF-THEN statements are preferable. My solution is a bit convoluted and could be considered overkill in this example. It also assumes that no one can be multiple races. If someone has multiple races, it will take the last observation carrying that value.

data have;
infile datalines delimiter = ',' dsd;
input RaceAAB $ RaceAMIN $ RaceASIAN $ RaceNH_PI $ RaceWHITE $;
datalines;
,,,,Y
Y,,,,
,,Y,,
,,,Y,
Y,,,,
,Y,,,
,,,,Y
;
run;

proc format;
	value $race_col
		'AAB' = 'African American or Black'
		'AMIN' = 'Native American'
		'ASIAN' = 'Asian'
		'NH_PI' = 'Native Hawaiian'
		'WHITE' = 'White'
		other = 'Not matched';
run;

data want (drop = i);
	set have;
	array _r [*] $ RaceAAB -- RaceWHITE;
	do i = 1 to dim(_r);
		if _r[i] = 'Y' then Race = put(strip(tranwrd(vname(_r[i]), 'Race', '')), $race_col.);
	end;
run;

maguiremq_0-1652366337530.png

As always, if something can be done simply, do it that way.

ballardw
Super User

I would run code similar to this to verify that you only have one race category per record before proceeding.

Proc freq data=have;
   tables racewhite*raceaab*raceamin*raceasian*racenh_pi/ list nopercent nocum missing;
run;

Many of the data systems I have seen that allow the sort of data you show also allow multiple choices. If you get any of the results from the Proc Freq code I show with more than one of the race variables in a row with Y then you need to consider how those are to be coded.

mayasak
Quartz | Level 8
Sure. Thank you

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1689 views
  • 1 like
  • 4 in conversation