BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Help! I am trying to create a variable for RACE out of many different categories of separate Yes and No. (race_african_american=y or n, race_white=y or n)

So here's the code:

Data s.race; set s.data_race;
If race_african_american='Y' then RACE='African American';
If race_white='Y' then RACE='White' ;
if race_american_indian='Y' then RACE='American Indian' ; etc etc....

It seems to be working....but none of the resulted values in "RACE" are lining up with the correct observations! The first instance of race_white='y' is blank, and the following observation (which is race_african_american='y') says 'White' instead of African American. After that first blank record, all of the rest of the observations are correct, they just all need to be moved up one to match with the correct subject!!

Is there something I don't know about If / Then??
5 REPLIES 5
data_null__
Jade | Level 19
How about a method that lets SAS do most of the work and provides a way to easily trap data errors.

[pre]
data test;
input id:$3. (race_african_american race_white race_american_indian)($upcase1.);
card;
001 .ny
002 y
003 y
008 yy
;;;;
run;
proc print;
run;
proc transpose name=race data=test out=eo(where=(col1 eq'Y'));
by id;
var race_:;
run;
data eo errors;
set eo(drop=col1);
by id;
race = propcase(translate(substr(race,5),' ','_'));
if first.id and last.id then output eo;
else output errors;
run;
proc print data=eo;
run;
[/pre]
Doc_Duke
Rhodochrosite | Level 12
The approach you are taking assumes a hierarchy of race coding. In this case, it is "last one in wins." The reason that the data are collected as multiple separate fields is that individuals can be of multiple races. If you want to collapse it into one field, you have to determine your hierarchy and then program to that.

IF ... THEN race=...;
IF ... THEN race=...;
etc.
evaluates each statement and assigns race based on the sequential evaluation of each IF-THEN statement. So the LAST one with a true IF clause is the result you get.

On the other hand,
IF ... THEN race=...;
ELSE IF ... THEN race=...;
etc.
evaluates and assigns in a hierarchical fashion, so the first IF that is true is the result you get.

You have to make a decision on the utility of the different schemas and program appropriately.

Also, remember that "y" and "Y" are different responses so you may want to use the UPCASE function or check both values.

I think that the code that data_null_ provided makes the assumption that one, and only one, race type will be coded as a "y" and multiple "y"'s are errors. That may or may not be appropriate depending on the source of the data.

Doc Muhlbaier
Duke
deleted_user
Not applicable
Thanks to both of you, for your responses 🙂
Everything is already in uppercase. There are no repeated "Y" values per subject; just one per subject. I understand that individuals can be more than one race but the way that the data was entered initially only allows for one response. Any persons of multiple races is 'other' in this case (not the best structure, but it's what I'm working with).

Using the if or if /else if statements has given me the same result; I still see that all the values for "RACE" have been shifted down one, and the very first one is blank. Looking at all the fields, I can see that they would all line up with the original data perfectly if I could shift them all up one cell.
Doc_Duke
Rhodochrosite | Level 12
The code snippet that you provided wouldn't cause that problem. That tells me the problem is somewhere else in your program. Perhaps you have the field "race" in a LAG or RETAIN statement somewhere. Without seeing more code I'm at a loss to help much more.

Perhaps, for debugging, you should break down your data step in multiple steps with PROC PRINTs in between. Or you could add PUT statement immediately after the series of IFs.
deleted_user
Not applicable
I figured it out. Actually I had posted the wrong code here; I had my 'set' statement in the wrong place. Placing the set statement after the 'if/then/else' statements was messing it all up.

My bad 😞

Thanks for the help 🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 845 views
  • 0 likes
  • 3 in conversation