I have a data set with multiple records for each student.
I need to find last enrolled semester if they are not graduated.
id | term | term_code | matriculant | matric_label | Cum_passed |
1 | FS | 6 | 1 | Enrolled | 64 |
1 | SS | 7 | 1 | Enrolled | 80 |
1 | FS | 7 | 1 | Enrolled | 102 |
1 | SS | 8 | 2 | Graduated | |
2 | FS | 12 | 1 | Enrolled | 82 |
2 | SS | 13 | 1 | Enrolled | 94 |
2 | FS | 13 | 1 | Enrolled | 114 |
2 | SS | 14 | 3 | Grad. and still enrolled | 120 |
2 | FS | 14 | 2 | Graduated | |
3 | FS | 13 | 1 | Enrolled | 6 |
3 | SS | 14 | 0 | Not enrolled/graduated | |
3 | FS | 14 | 0 | Not enrolled/graduated | |
3 | SS | 15 | 0 | Not enrolled/graduated | |
3 | FS | 15 | 0 | Not enrolled/graduated | |
4 | FS | 13 | 1 | Enrolled | 12 |
4 | SS | 14 | 0 | Not enrolled/graduated | |
4 | FS | 14 | 1 | Enrolled | 17 |
4 | SS | 15 | 0 | Not enrolled/graduated | |
4 | FS | 15 | 0 | Not enrolled/graduated | |
5 | FS | 15 | 1 | Enrolled | 26 |
5 | SS | 16 | 1 | Enrolled | 39 |
5 | FS | 16 | 1 | Enrolled | 55 |
5 | SS | 17 | 1 | Enrolled | 68 |
5 | FS | 17 | 1 | Enrolled | 82 |
I need output to be or some identifier in new column as these students did not graduate/return.
3 | FS | 13 | 1 | Enrolled | 6 |
4 | FS | 14 | 1 | Enrolled | 17 |
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*, c.IDct, c.MLSum
FROM WORK.Have AS a
LEFT JOIN (SELECT ID FROM WORK.HAVE WHERE matric_label="Graduated") AS b ON a.ID=b.ID
LEFT JOIN (SELECT ID, COUNT(ID) AS IDCt, SUM(CASE WHEN matric_label="Enrolled" THEN 1 ELSE 0 END) AS MLSum FROM WORK.Have GROUP BY ID HAVING Calculated IDCt=Calculated MLSum) AS c ON a.ID=c.ID
WHERE b.ID=.
AND c.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
Some of the selection criteria depends on how your fields are set (numeric vs. character) but the overall logic will get what you are looking to accomplish.
DATA have;
FORMAT id 8. term $2. term_code $2. matriculant $2. matric_label $25. Cum_passed 8.;
INFORMAT id 8. term $2. term_code $2. matriculant $2. matric_label $25. Cum_passed 8.;
INPUT id term term_code matriculant matric_label Cum_passed;
CARDS;
1 FS 6 1 Enrolled 64
1 SS 7 1 Enrolled 80
1 FS 7 1 Enrolled 102
1 SS 8 2 Graduated .
2 FS 12 1 Enrolled 82
2 SS 13 1 Enrolled 94
2 FS 13 1 Enrolled 114
2 SS 14 3 Grad.andStillenrolled 120
2 FS 14 2 Graduated .
3 FS 13 1 Enrolled 6
3 SS 14 0 NotEnrolled/graduated .
3 FS 14 0 NotEnrolled/graduated .
3 SS 15 0 NotEnrolled/graduated .
3 FS 15 0 NotEnrolled/graduated .
4 FS 13 1 Enrolled 12
4 SS 14 0 NotEnrolled/graduated .
4 FS 14 1 Enrolled 17
4 SS 15 0 NotEnrolled/graduated .
4 FS 15 0 NotEnrolled/graduated .
5 FS 15 1 Enrolled 26
5 SS 16 1 Enrolled 39
5 FS 16 1 Enrolled 55
5 SS 17 1 Enrolled 68
5 FS 17 1 Enrolled 82
;
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*
FROM WORK.Have AS a
LEFT JOIN (SELECT ID
FROM WORK.HAVE
WHERE matric_label="Graduated") AS b ON a.ID=b.ID
WHERE b.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term DESC;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING TERM;
IF FIRST.ID AND FIRST.Term THEN OUTPUT;
RUN;
Thanks for your reply, it is giving me second last term for some ids.
I have attached file and highlighted some of the rows which i desire as output.
If student is enrolled throughout or graduated or graduated and enrolled these can be deleted.
I am looking for student who did not return with their last term's cumulative gpa.
I have attached my worksheet.
Made some updates overall. Give this a try:
DATA have;
FORMAT id 8. term $2. term_code 2. matriculant $2. matric_label $25. Cum_passed 8.;
INFORMAT id 8. term $2. term_code 2. matriculant $2. matric_label $25. Cum_passed 8.;
INPUT id term term_code matriculant matric_label Cum_passed;
CARDS;
1 FS 6 1 Enrolled 64
1 SS 7 1 Enrolled 80
1 FS 7 1 Enrolled 102
1 SS 8 2 Graduated .
2 FS 12 1 Enrolled 82
2 SS 13 1 Enrolled 94
2 FS 13 1 Enrolled 114
2 SS 14 3 Grad.andStillenrolled 120
2 FS 14 2 Graduated .
3 FS 13 1 Enrolled 6
3 SS 14 0 NotEnrolled/graduated .
3 FS 14 0 NotEnrolled/graduated .
3 SS 15 0 NotEnrolled/graduated .
3 FS 15 0 NotEnrolled/graduated .
4 FS 13 1 Enrolled 12
4 SS 14 0 NotEnrolled/graduated .
4 FS 14 1 Enrolled 17
4 SS 15 0 NotEnrolled/graduated .
4 FS 15 0 NotEnrolled/graduated .
5 FS 15 1 Enrolled 26
5 SS 16 1 Enrolled 39
5 FS 16 1 Enrolled 55
5 SS 17 1 Enrolled 68
5 FS 17 1 Enrolled 82
16020 FS 7 1 Enrolled 28
16020 SS 8 1 Enrolled 44
16020 FS 8 1 Enrolled 58
16020 SS 9 1 Enrolled 74
16020 FS 9 1 Enrolled 92
16020 SS 10 1 Enrolled 103
16020 FS 10 1 Enrolled 126
16020 SS 11 1 Enrolled 133
16020 FS 11 1 Enrolled 141
16020 SS 12 0 NotEnrolled/graduated .
16020 FS 12 0 NotEnrolled/graduated .
16020 SS 13 0 NotEnrolled/graduated .
17387 FS 6 1 Enrolled 1
17387 SS 7 0 NotEnrolled/graduated .
17387 FS 7 0 NotEnrolled/graduated .
17387 SS 8 0 NotEnrolled/graduated .
17387 FS 8 0 NotEnrolled/graduated .
17387 SS 9 0 NotEnrolled/graduated .
17387 FS 9 0 NotEnrolled/graduated .
17387 SS 10 0 NotEnrolled/graduated .
17387 FS 10 0 NotEnrolled/graduated .
17387 SS 11 0 NotEnrolled/graduated .
17387 FS 11 0 NotEnrolled/graduated .
17387 SS 12 0 NotEnrolled/graduated .
18088 FS 6 1 Enrolled 14
18088 SS 7 1 Enrolled 28
18088 FS 7 1 Enrolled 37
18088 SS 8 1 Enrolled 46
18088 FS 8 1 Enrolled 67
18088 SS 9 1 Enrolled 83
18088 FS 9 1 Enrolled 102
18088 SS 10 1 Enrolled 114
18088 FS 10 1 Enrolled 132
18088 SS 11 1 Enrolled 141
18088 FS 11 0 NotEnrolled/graduated .
18088 SS 12 3 Grad.andStillEnrolled 158
;
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*
FROM WORK.Have AS a
LEFT JOIN (SELECT ID
FROM WORK.HAVE
WHERE matric_label="Graduated") AS b ON a.ID=b.ID
WHERE b.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
This worked too, but can you help me remove who is "enrolled" for all terms.
For example PID=45.
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*, c.IDct, c.MLSum
FROM WORK.Have AS a
LEFT JOIN (SELECT ID FROM WORK.HAVE WHERE matric_label="Graduated") AS b ON a.ID=b.ID
LEFT JOIN (SELECT ID, COUNT(ID) AS IDCt, SUM(CASE WHEN matric_label="Enrolled" THEN 1 ELSE 0 END) AS MLSum FROM WORK.Have GROUP BY ID HAVING Calculated IDCt=Calculated MLSum) AS c ON a.ID=c.ID
WHERE b.ID=.
AND c.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
Thanks a lot. This is what I was looking for.
Much appreciate your help.
Glad I could help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.