Help using Base SAS procedures

finding duplicate records

Accepted Solution Solved
Reply
Super Contributor
Posts: 268
Accepted Solution

finding duplicate records

I’m looking at 1st sem grades and I’ve noticed that occasionally a student has 2 different marks for the same subject in the same term.  I sorted this by student_num (which is unique for each student), crstitle, coursenum and mark.  Is there some way I can find records where all of these variables would be exactly the same except for the mark?  I need to eliminate those kinds of records  Thanks.

The SAS System

Obs

Student_Num

Crstitle

CourseNum

Mark

1

10031

Lang Arts 7 GT

2001G700

91

2

10031

Life Science 7

22210700

97

3

10031

Social Studies 7

23100700

98

4

10031

Advanced Art 7

2501A700

100

5

10031

Strings 7

25500700

100

6

10031

Algebra 1 H

4111MSHW

96

7

10041

Lang Arts 7

20010700

94

8

10041

Pre-Algebra

2110PA00

90

9

10041

Life Science 7

22210700

89

10

10041

Social Studies 7

23100700

93

11

10041

Art 7

2501S700

100

12

10041

Chorus 7

25410700

99

13

10069

Lang Arts 6 GT

1001G600

95

14

10069

Pre-Algebra 6

1110P600

97

15

10069

Science 6

12010600

97

16

10069

Social Studies 6

13010600

97

17

10069

Strings 6

15510600

99

18

10069

PE/Health 6

2460S600

100


Accepted Solutions
Solution
‎01-27-2012 11:23 AM
Respected Advisor
Posts: 3,124

finding duplicate records

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by  student_num crstitle coursenum;

run;

Regards,

Haikuo

View solution in original post


All Replies
Solution
‎01-27-2012 11:23 AM
Respected Advisor
Posts: 3,124

finding duplicate records

There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.

Proc sort data=have out=want nodupkey;

by  student_num crstitle coursenum;

run;

Regards,

Haikuo

Super Contributor
Posts: 1,636

Re: finding duplicate records

How about:

options nocenter;

data have;

input Student_Num $ Crstitle $ CourseNum Mark;

cards;

1 aa 100 90

1 bb 200 100

1 aa 100 98

2 aa 100 90

2 bb 200 100

2 aa 100 98

3 aa 100 90

3 bb 200 100

4 aa 100 98

;

proc sort;

by Student_Num  Crstitle  CourseNum;

data want;

  set have;

  by Student_Num  Crstitle  CourseNum;

  if first.CourseNum and last.CourseNum then delete;

proc print;

title duplicate records by Student_Num  Crstitle  CourseNum;

run;

title Unique records by Student_Num  Crstitle  CourseNum;

proc sql;

  select *,count(*) as total from have

    group by Student_Num,  Crstitle,  CourseNum

        having total=1;

quit;

title;

duplicate records by Student_Num  Crstitle  CourseNum          

       Student_                Course

Obs      Num       Crstitle      Num     Mark

1        1           aa         100      90

2        1           aa         100      98

3        2           aa         100      90

4        2           aa         100      98

Unique records by Student_Num  Crstitle  CourseNum             

Student_Num  Crstitle  CourseNum      Mark     total

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1            bb              200       100         1

2            bb              200       100         1

3            aa              100        90         1

3            bb              200       100         1

4            aa              100        98         1

Linlin

Frequent Contributor
Posts: 129

Re: finding duplicate records

Gregg,

How do you want to handle the different values for Mark?

If you do not care,  then HaiKuo's suggestion will work fine.  Proc sort with nodupkeys will return an arbitrary record.

If you want to restrict to largest or smallest then a variation of LinLin's code will work. 

proc sort;

by Student_Num  Crstitle  CourseNum Mark;

data want;

  set have;

  by Student_Num  Crstitle  CourseNum;

  * keeps lowest value of Mark

    if you want largest value then substitute

     last.CourseNum for first.courseNum in the

     locigal portion of the if statement ;

  if not first.CourseNum and last.CourseNum then delete;

LinLin's code could also be modified to handle other criteria.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 272 views
  • 3 likes
  • 4 in conversation