Help using Base SAS procedures

dealing with duplicate records

Reply
New Contributor
Posts: 4

dealing with duplicate records

I have data containing student IDs, classes, grades, semesters, etc. I want to find all the times that a student repeated a class; so the same student ID and the same course number but not necessarily the same grade or semester. i want to keep only the last (by date) recorded grade for that class but at the same time i want to count the number of repeat classes.
here is some data:

ID@semester(1st number=semester number/last 2 numbers=year)
@class@credits@grade recieved

04F1M@191@C S 131@2.0@UW@
04F1M@187 @CL CV 241@3.0@C@
04F1M@587@ECON 110@3.0@D-@
04F1M@590@ECON 110@3.0@B@
04F1M@586@ENGL 115@4.0@B@
04F1M@590@ENGL 391@3.0@C-@

thanks
Esteemed Advisor
Posts: 7,092

Re: dealing with duplicate records

You might be able to do what you want with something like:


data have;
infile cards delimiter="@";
informat class $10.;
format date date9.;
input ID $ semester $ class credits grade_recieved $;
date=mdy(substr(semester,1,1),1,substr(semester,2,2));
cards;
04F1M@191@C S 131@2.0@UW
04F1M@187 @CL CV 241@3.0@C
04F1M@587@ECON 110@3.0@D-
04F1M@590@ECON 110@3.0@B
04F1M@586@ENGL 115@4.0@B
04F1M@590@ENGL 391@3.0@C-
;

proc sort data=have;
by ID class date;
run;

data want;
set have;
by id class;
if first.class then count=1;
else count+1;
if last.class then output;
run;

HTH,
Art
Super Contributor
Posts: 578

Re: dealing with duplicate records

How about:

data Classes_Taken;
infile cards delimiter="@";
informat class $10.;
format date date9.;
input ID $ semester $ class credits grade_recieved $;
class_date=mdy(substr(semester,1,1),1,substr(semester,2,2));
cards;
04F1M@191@C S 131@2.0@UW
04F1M@187 @CL CV 241@3.0@C
04F1M@587@ECON 110@3.0@D-
04F1M@590@ECON 110@3.0@B
04F1M@586@ENGL 115@4.0@B
04F1M@590@ENGL 391@3.0@C-
;
run;

proc sql;
create table retakes as
select
id,
class,
count(*) as times_taken,
max(class_date) Last_date
from work.classes_taken
group by id, class
having count(*)>1;

create table last_retake as
select
t2.*,
t1.times_taken
from work.retakes t1 inner join work.classes_taken t2
on t1.id=t2.id and t1.class=t2.class and t1.last_date=t2.class_date;
quit;
Ask a Question
Discussion stats
  • 2 replies
  • 62 views
  • 0 likes
  • 3 in conversation