I’m looking at 1st sem grades and I’ve noticed that occasionally a student has 2 different marks for the same subject in the same term. I sorted this by student_num (which is unique for each student), crstitle, coursenum and mark. Is there some way I can find records where all of these variables would be exactly the same except for the mark? I need to eliminate those kinds of records Thanks.
The SAS System |
Obs | Student_Num | Crstitle | CourseNum | Mark |
1 | 10031 | Lang Arts 7 GT | 2001G700 | 91 |
2 | 10031 | Life Science 7 | 22210700 | 97 |
3 | 10031 | Social Studies 7 | 23100700 | 98 |
4 | 10031 | Advanced Art 7 | 2501A700 | 100 |
5 | 10031 | Strings 7 | 25500700 | 100 |
6 | 10031 | Algebra 1 H | 4111MSHW | 96 |
7 | 10041 | Lang Arts 7 | 20010700 | 94 |
8 | 10041 | Pre-Algebra | 2110PA00 | 90 |
9 | 10041 | Life Science 7 | 22210700 | 89 |
10 | 10041 | Social Studies 7 | 23100700 | 93 |
11 | 10041 | Art 7 | 2501S700 | 100 |
12 | 10041 | Chorus 7 | 25410700 | 99 |
13 | 10069 | Lang Arts 6 GT | 1001G600 | 95 |
14 | 10069 | Pre-Algebra 6 | 1110P600 | 97 |
15 | 10069 | Science 6 | 12010600 | 97 |
16 | 10069 | Social Studies 6 | 13010600 | 97 |
17 | 10069 | Strings 6 | 15510600 | 99 |
18 | 10069 | PE/Health 6 | 2460S600 | 100 |
There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.
Proc sort data=have out=want nodupkey;
by student_num crstitle coursenum;
run;
Regards,
Haikuo
There are many ways to solve your problem. I assume you only to get rid of the duplicates, not all of obs with duplicates.
Proc sort data=have out=want nodupkey;
by student_num crstitle coursenum;
run;
Regards,
Haikuo
How about:
options nocenter;
data have;
input Student_Num $ Crstitle $ CourseNum Mark;
cards;
1 aa 100 90
1 bb 200 100
1 aa 100 98
2 aa 100 90
2 bb 200 100
2 aa 100 98
3 aa 100 90
3 bb 200 100
4 aa 100 98
;
proc sort;
by Student_Num Crstitle CourseNum;
data want;
set have;
by Student_Num Crstitle CourseNum;
if first.CourseNum and last.CourseNum then delete;
proc print;
title duplicate records by Student_Num Crstitle CourseNum;
run;
title Unique records by Student_Num Crstitle CourseNum;
proc sql;
select *,count(*) as total from have
group by Student_Num, Crstitle, CourseNum
having total=1;
quit;
title;
duplicate records by Student_Num Crstitle CourseNum
Student_ Course
Obs Num Crstitle Num Mark
1 1 aa 100 90
2 1 aa 100 98
3 2 aa 100 90
4 2 aa 100 98
Unique records by Student_Num Crstitle CourseNum
Student_Num Crstitle CourseNum Mark total
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 bb 200 100 1
2 bb 200 100 1
3 aa 100 90 1
3 bb 200 100 1
4 aa 100 98 1
Linlin
Gregg,
How do you want to handle the different values for Mark?
If you do not care, then HaiKuo's suggestion will work fine. Proc sort with nodupkeys will return an arbitrary record.
If you want to restrict to largest or smallest then a variation of LinLin's code will work.
proc sort;
by Student_Num Crstitle CourseNum Mark;
data want;
set have;
by Student_Num Crstitle CourseNum;
* keeps lowest value of Mark
if you want largest value then substitute
last.CourseNum for first.courseNum in the
locigal portion of the if statement ;
if not first.CourseNum and last.CourseNum then delete;
LinLin's code could also be modified to handle other criteria.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.