Solved: finding duplicate records

GreggB · Posted 01-27-2012 11:08 AM

I’m looking at 1^st sem grades and I’ve noticed that occasionally a student has 2 different marks for the same subject in the same term. I sorted this by student_num (which is unique for each student), crstitle, coursenum and mark. Is there some way I can find records where all of these variables would be exactly the same except for the mark? I need to eliminate those kinds of records Thanks.

The SAS System

Obs	Student_Num	Crstitle	CourseNum	Mark
1	10031	Lang Arts 7 GT	2001G700	91
2	10031	Life Science 7	22210700	97
3	10031	Social Studies 7	23100700	98
4	10031	Advanced Art 7	2501A700	100
5	10031	Strings 7	25500700	100
6	10031	Algebra 1 H	4111MSHW	96
7	10041	Lang Arts 7	20010700	94
8	10041	Pre-Algebra	2110PA00	90
9	10041	Life Science 7	22210700	89
10	10041	Social Studies 7	23100700	93
11	10041	Art 7	2501S700	100
12	10041	Chorus 7	25410700	99
13	10069	Lang Arts 6 GT	1001G600	95
14	10069	Pre-Algebra 6	1110P600	97
15	10069	Science 6	12010600	97
16	10069	Social Studies 6	13010600	97
17	10069	Strings 6	15510600	99
18	10069	PE/Health 6	2460S600	100

Haikuo · Posted 01-27-2012 11:23 AM

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by student_num crstitle coursenum;

run;

Regards,

Haikuo

View solution in original post

Haikuo · Posted 01-27-2012 11:23 AM

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by student_num crstitle coursenum;

run;

Regards,

Haikuo

Linlin · Posted 01-27-2012 11:27 AM

How about:

options nocenter;

data have;

input Student_Num $ Crstitle $ CourseNum Mark;

cards;

1 aa 100 90

1 bb 200 100

1 aa 100 98

2 aa 100 90

2 bb 200 100

2 aa 100 98

3 aa 100 90

3 bb 200 100

4 aa 100 98

;

proc sort;

by Student_Num Crstitle CourseNum;

data want;

set have;

by Student_Num Crstitle CourseNum;

if first.CourseNum and last.CourseNum then delete;

proc print;

title duplicate records by Student_Num Crstitle CourseNum;

run;

title Unique records by Student_Num Crstitle CourseNum;

proc sql;

select *,count(*) as total from have

group by Student_Num, Crstitle, CourseNum

having total=1;

quit;

title;

duplicate records by Student_Num Crstitle CourseNum

Student_ Course

Obs Num Crstitle Num Mark

1 1 aa 100 90

2 1 aa 100 98

3 2 aa 100 90

4 2 aa 100 98

Unique records by Student_Num Crstitle CourseNum

Student_Num Crstitle CourseNum Mark total

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 bb 200 100 1

2 bb 200 100 1

3 aa 100 90 1

3 bb 200 100 1

4 aa 100 98 1

Linlin

LarryWorley · Posted 01-27-2012 12:58 PM

Gregg,

How do you want to handle the different values for Mark?

If you do not care, then HaiKuo's suggestion will work fine. Proc sort with nodupkeys will return an arbitrary record.

If you want to restrict to largest or smallest then a variation of LinLin's code will work.

proc sort;

by Student_Num Crstitle CourseNum Mark;

data want;

set have;

by Student_Num Crstitle CourseNum;

* keeps lowest value of Mark

if you want largest value then substitute

last.CourseNum for first.courseNum in the

locigal portion of the if statement ;

if not first.CourseNum and last.CourseNum then delete;

LinLin's code could also be modified to handle other criteria.

finding duplicate records

finding duplicate records

finding duplicate records

Re: finding duplicate records

Re: finding duplicate records

Catch up on SAS Innovate 2026

finding duplicate records

finding duplicate records

finding duplicate records

Re: finding duplicate records

Re: finding duplicate records

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away