i have data as seen below. Instructors have a unique email address but may have multiple IDs if they serve more than one school. The restriction on the file I'm submitting is only one ID per instructor. I want to find all cases such as hsmith@yahoo.com and give him the same ID in all obs. (It doesn't matter which one as long as he only has one.) Right now, I'm running a PROC FREQ with a TABLE statement of email*tchid/list and doing a visual inspection (not exactly an elegant solution.)
email tchid
jdoe@aol.com 01
You could use SQL. For example you could decide to pick the smallest id for each email address.
(NOTE: are you sure that the different schools are using different ranges of id values? Otherwise you might have the same id for two different instructors.)
proc sql ;
create table uniqueid as
select distinct email, min(tid) as tid
from mutlipleid
group by email
;
quit;
You could use SQL. For example you could decide to pick the smallest id for each email address.
(NOTE: are you sure that the different schools are using different ranges of id values? Otherwise you might have the same id for two different instructors.)
proc sql ;
create table uniqueid as
select distinct email, min(tid) as tid
from mutlipleid
group by email
;
quit;
or, if you prefer a datastep solution:
data have;
informat email $40.;
input email tchid $;
cards;
jdoe@aol.com 01
;
proc sort data=have (drop=tchid) out=want;
by email;
run;
data want (drop=temp:);
set want;
by email;
if first.email then tempid+1;
tchid=put(tempid,z2.);
run;
That could, of course, be modified to retain some of the existing ids.
this would assign new IDs, wouldn't it? I have to use the ones that are already in the student information system.
Easy to accomodate:
data have;
informat email $40.;
input email tchid $;
cards;
jdoe@aol.com 01
;
proc sort data=have out=want;
by email;
run;
data want (drop=hold:);
set want;
by email;
retain holdid;
if first.email then holdid=tchid;
else tchid=holdid;
run;
schools are using different ranges. they are assigned at a central location to ensure there's no overlap.
you can get what you want by:
proc sort data=have out=want nodupkey;
by email;
run;
Linlin
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.