## finding duplicate records

Solved
Super Contributor
Posts: 286

# finding duplicate records

I’m looking at 1st sem grades and I’ve noticed that occasionally a student has 2 different marks for the same subject in the same term.  I sorted this by student_num (which is unique for each student), crstitle, coursenum and mark.  Is there some way I can find records where all of these variables would be exactly the same except for the mark?  I need to eliminate those kinds of records  Thanks.

 The SAS System

 Obs Student_Num Crstitle CourseNum Mark 1 10031 Lang Arts 7 GT 2001G700 91 2 10031 Life Science 7 22210700 97 3 10031 Social Studies 7 23100700 98 4 10031 Advanced Art 7 2501A700 100 5 10031 Strings 7 25500700 100 6 10031 Algebra 1 H 4111MSHW 96 7 10041 Lang Arts 7 20010700 94 8 10041 Pre-Algebra 2110PA00 90 9 10041 Life Science 7 22210700 89 10 10041 Social Studies 7 23100700 93 11 10041 Art 7 2501S700 100 12 10041 Chorus 7 25410700 99 13 10069 Lang Arts 6 GT 1001G600 95 14 10069 Pre-Algebra 6 1110P600 97 15 10069 Science 6 12010600 97 16 10069 Social Studies 6 13010600 97 17 10069 Strings 6 15510600 99 18 10069 PE/Health 6 2460S600 100

Accepted Solutions
Solution
‎01-27-2012 11:23 AM
Posts: 3,167

## finding duplicate records

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by  student_num crstitle coursenum;

run;

Regards,

Haikuo

All Replies
Solution
‎01-27-2012 11:23 AM
Posts: 3,167

## finding duplicate records

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by  student_num crstitle coursenum;

run;

Regards,

Haikuo

Super Contributor
Posts: 1,636

## Re: finding duplicate records

options nocenter;

data have;

input Student_Num \$ Crstitle \$ CourseNum Mark;

cards;

1 aa 100 90

1 bb 200 100

1 aa 100 98

2 aa 100 90

2 bb 200 100

2 aa 100 98

3 aa 100 90

3 bb 200 100

4 aa 100 98

;

proc sort;

by Student_Num  Crstitle  CourseNum;

data want;

set have;

by Student_Num  Crstitle  CourseNum;

if first.CourseNum and last.CourseNum then delete;

proc print;

title duplicate records by Student_Num  Crstitle  CourseNum;

run;

title Unique records by Student_Num  Crstitle  CourseNum;

proc sql;

select *,count(*) as total from have

group by Student_Num,  Crstitle,  CourseNum

having total=1;

quit;

title;

duplicate records by Student_Num  Crstitle  CourseNum

Student_                Course

Obs      Num       Crstitle      Num     Mark

1        1           aa         100      90

2        1           aa         100      98

3        2           aa         100      90

4        2           aa         100      98

Unique records by Student_Num  Crstitle  CourseNum

Student_Num  Crstitle  CourseNum      Mark     total

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1            bb              200       100         1

2            bb              200       100         1

3            aa              100        90         1

3            bb              200       100         1

4            aa              100        98         1

Linlin

Frequent Contributor
Posts: 129

## Re: finding duplicate records

Gregg,

How do you want to handle the different values for Mark?

If you do not care,  then HaiKuo's suggestion will work fine.  Proc sort with nodupkeys will return an arbitrary record.

If you want to restrict to largest or smallest then a variation of LinLin's code will work.

proc sort;

by Student_Num  Crstitle  CourseNum Mark;

data want;

set have;

by Student_Num  Crstitle  CourseNum;

* keeps lowest value of Mark

if you want largest value then substitute

last.CourseNum for first.courseNum in the

locigal portion of the if statement ;

if not first.CourseNum and last.CourseNum then delete;

LinLin's code could also be modified to handle other criteria.

🔒 This topic is solved and locked.