Hi,
I’m trying to determine the race of subjects in my dataset based on their last name. I have a list of last names that indicate a given race. My dataset looks something like this:
White_name Subject_name Race
Johnson Appleby White
MacDonald Brians White
Burns MacDonald [blank]
Manning Murphy White
White_name represents a databank of all potential white last names.
Subject_name represents the last name of the subjects in my sample.
Race is a partially complete variable with information on their race.
I would like to fill in the blanks in the Race variable based on the Subject_name being found in the White_name list.
My logic would flow something like this:
if subject_name=any of the White_name then race2=’White’; else race2=race;
However, I believe I need to loop through all the names in the White_name variable in order to find a match. Note that I have a large cohort with many repetitions of last names.
Would appreciate any help! Please note that I am a very new SAS user!
Thank you!
I assume you have a dataset that maps the names to race?
I would recommend proc format, using the dataset to create your format and then apply it.
Here's a paper that's worth reading regarding proc format.
FYI, this method of assigning race by last names is flawed beyond belief and has been thoroughly shown to be incorrect and unreliable.
data lookup;
informat name race $20.;
input name $ race $;
cards;
Johnson White
McDonald White
Burns White
Ortez Mexican
;
run;
data have;
input name $20.;
cards;
Johnson
McDonald
Burns
Ortez
Wrong
;
run;
data create_format;
set lookup end=last;
fmtname='NameToRace';
type='C';
start=name;
label=race;
if last then do;
output;
hlo='O';
label='Other';
output;
end;
else output;
run;
proc format cntlin=create_format;
run;
data want;
set have;
race=put(name, $NameToRace.);
run;
The use of Proc Format or Hash Objects will be suitable for your problem. I advise you to read on creating Fornats from a Data Set as Hash objects may be much for you now.
I can't believe I am writing this. This would be totally illegal in some countries! Race? Really?
Anyway, this may do:
data T;
infile cards missover;
input White_name $ Subject_name $ Race $;
cards;
Johnson Appleby White
MacDonald Brians White
Burns MacDonald
Manning Murphy White
xx zz
run;
proc sql;
select a.*
,coalescec(b.RACE,a.RACE) as RACE2
from T a
left join
T b on a.Subject_name=b.White_name;
quit;
White_name | Subject_name | Race | RACE2 |
---|---|---|---|
Johnson | Appleby | White | White |
MacDonal | Brians | White | White |
Burns | MacDonal | White | |
Manning | Murphy | White | White |
xx | zz |
Thank you for your response!
My data is quite large (8000+) names. Would this method require me to write all the names in SAS?
Where would the names be if not in SAS?
This is a dodgy process. I think I'll be more comfortable staying out of it actually.
This project is designed to prove how unreliable it is to base race off last names.
My data are in an excel file which I have imported into SAS. It looks like in your code that you manually type in all those names. Is there any way to select the variable as a whole without typing out all the names in the code?
@ChrisNZ typed in lines of data to match your example data as you didn't provide that to us. The set that has the names you have already read into SAS can be used.
There may be some other issues on how to use your exact data as we do not have that but general approach is what was demonstrated.
I typed in the names to generate a sample data set. Replace your dataset with the names of the sample datasets.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.