I have a data set with multiple records for each student.
I need to find last enrolled semester if they are not graduated.
id | term | term_code | matriculant | matric_label | Cum_passed |
1 | FS | 6 | 1 | Enrolled | 64 |
1 | SS | 7 | 1 | Enrolled | 80 |
1 | FS | 7 | 1 | Enrolled | 102 |
1 | SS | 8 | 2 | Graduated | |
2 | FS | 12 | 1 | Enrolled | 82 |
2 | SS | 13 | 1 | Enrolled | 94 |
2 | FS | 13 | 1 | Enrolled | 114 |
2 | SS | 14 | 3 | Grad. and still enrolled | 120 |
2 | FS | 14 | 2 | Graduated | |
3 | FS | 13 | 1 | Enrolled | 6 |
3 | SS | 14 | 0 | Not enrolled/graduated | |
3 | FS | 14 | 0 | Not enrolled/graduated | |
3 | SS | 15 | 0 | Not enrolled/graduated | |
3 | FS | 15 | 0 | Not enrolled/graduated | |
4 | FS | 13 | 1 | Enrolled | 12 |
4 | SS | 14 | 0 | Not enrolled/graduated | |
4 | FS | 14 | 1 | Enrolled | 17 |
4 | SS | 15 | 0 | Not enrolled/graduated | |
4 | FS | 15 | 0 | Not enrolled/graduated | |
5 | FS | 15 | 1 | Enrolled | 26 |
5 | SS | 16 | 1 | Enrolled | 39 |
5 | FS | 16 | 1 | Enrolled | 55 |
5 | SS | 17 | 1 | Enrolled | 68 |
5 | FS | 17 | 1 | Enrolled | 82 |
I need output to be or some identifier in new column as these students did not graduate/return.
3 | FS | 13 | 1 | Enrolled | 6 |
4 | FS | 14 | 1 | Enrolled | 17 |
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*, c.IDct, c.MLSum
FROM WORK.Have AS a
LEFT JOIN (SELECT ID FROM WORK.HAVE WHERE matric_label="Graduated") AS b ON a.ID=b.ID
LEFT JOIN (SELECT ID, COUNT(ID) AS IDCt, SUM(CASE WHEN matric_label="Enrolled" THEN 1 ELSE 0 END) AS MLSum FROM WORK.Have GROUP BY ID HAVING Calculated IDCt=Calculated MLSum) AS c ON a.ID=c.ID
WHERE b.ID=.
AND c.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
Some of the selection criteria depends on how your fields are set (numeric vs. character) but the overall logic will get what you are looking to accomplish.
DATA have;
FORMAT id 8. term $2. term_code $2. matriculant $2. matric_label $25. Cum_passed 8.;
INFORMAT id 8. term $2. term_code $2. matriculant $2. matric_label $25. Cum_passed 8.;
INPUT id term term_code matriculant matric_label Cum_passed;
CARDS;
1 FS 6 1 Enrolled 64
1 SS 7 1 Enrolled 80
1 FS 7 1 Enrolled 102
1 SS 8 2 Graduated .
2 FS 12 1 Enrolled 82
2 SS 13 1 Enrolled 94
2 FS 13 1 Enrolled 114
2 SS 14 3 Grad.andStillenrolled 120
2 FS 14 2 Graduated .
3 FS 13 1 Enrolled 6
3 SS 14 0 NotEnrolled/graduated .
3 FS 14 0 NotEnrolled/graduated .
3 SS 15 0 NotEnrolled/graduated .
3 FS 15 0 NotEnrolled/graduated .
4 FS 13 1 Enrolled 12
4 SS 14 0 NotEnrolled/graduated .
4 FS 14 1 Enrolled 17
4 SS 15 0 NotEnrolled/graduated .
4 FS 15 0 NotEnrolled/graduated .
5 FS 15 1 Enrolled 26
5 SS 16 1 Enrolled 39
5 FS 16 1 Enrolled 55
5 SS 17 1 Enrolled 68
5 FS 17 1 Enrolled 82
;
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*
FROM WORK.Have AS a
LEFT JOIN (SELECT ID
FROM WORK.HAVE
WHERE matric_label="Graduated") AS b ON a.ID=b.ID
WHERE b.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term DESC;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING TERM;
IF FIRST.ID AND FIRST.Term THEN OUTPUT;
RUN;
Thanks for your reply, it is giving me second last term for some ids.
I have attached file and highlighted some of the rows which i desire as output.
If student is enrolled throughout or graduated or graduated and enrolled these can be deleted.
I am looking for student who did not return with their last term's cumulative gpa.
I have attached my worksheet.
Made some updates overall. Give this a try:
DATA have;
FORMAT id 8. term $2. term_code 2. matriculant $2. matric_label $25. Cum_passed 8.;
INFORMAT id 8. term $2. term_code 2. matriculant $2. matric_label $25. Cum_passed 8.;
INPUT id term term_code matriculant matric_label Cum_passed;
CARDS;
1 FS 6 1 Enrolled 64
1 SS 7 1 Enrolled 80
1 FS 7 1 Enrolled 102
1 SS 8 2 Graduated .
2 FS 12 1 Enrolled 82
2 SS 13 1 Enrolled 94
2 FS 13 1 Enrolled 114
2 SS 14 3 Grad.andStillenrolled 120
2 FS 14 2 Graduated .
3 FS 13 1 Enrolled 6
3 SS 14 0 NotEnrolled/graduated .
3 FS 14 0 NotEnrolled/graduated .
3 SS 15 0 NotEnrolled/graduated .
3 FS 15 0 NotEnrolled/graduated .
4 FS 13 1 Enrolled 12
4 SS 14 0 NotEnrolled/graduated .
4 FS 14 1 Enrolled 17
4 SS 15 0 NotEnrolled/graduated .
4 FS 15 0 NotEnrolled/graduated .
5 FS 15 1 Enrolled 26
5 SS 16 1 Enrolled 39
5 FS 16 1 Enrolled 55
5 SS 17 1 Enrolled 68
5 FS 17 1 Enrolled 82
16020 FS 7 1 Enrolled 28
16020 SS 8 1 Enrolled 44
16020 FS 8 1 Enrolled 58
16020 SS 9 1 Enrolled 74
16020 FS 9 1 Enrolled 92
16020 SS 10 1 Enrolled 103
16020 FS 10 1 Enrolled 126
16020 SS 11 1 Enrolled 133
16020 FS 11 1 Enrolled 141
16020 SS 12 0 NotEnrolled/graduated .
16020 FS 12 0 NotEnrolled/graduated .
16020 SS 13 0 NotEnrolled/graduated .
17387 FS 6 1 Enrolled 1
17387 SS 7 0 NotEnrolled/graduated .
17387 FS 7 0 NotEnrolled/graduated .
17387 SS 8 0 NotEnrolled/graduated .
17387 FS 8 0 NotEnrolled/graduated .
17387 SS 9 0 NotEnrolled/graduated .
17387 FS 9 0 NotEnrolled/graduated .
17387 SS 10 0 NotEnrolled/graduated .
17387 FS 10 0 NotEnrolled/graduated .
17387 SS 11 0 NotEnrolled/graduated .
17387 FS 11 0 NotEnrolled/graduated .
17387 SS 12 0 NotEnrolled/graduated .
18088 FS 6 1 Enrolled 14
18088 SS 7 1 Enrolled 28
18088 FS 7 1 Enrolled 37
18088 SS 8 1 Enrolled 46
18088 FS 8 1 Enrolled 67
18088 SS 9 1 Enrolled 83
18088 FS 9 1 Enrolled 102
18088 SS 10 1 Enrolled 114
18088 FS 10 1 Enrolled 132
18088 SS 11 1 Enrolled 141
18088 FS 11 0 NotEnrolled/graduated .
18088 SS 12 3 Grad.andStillEnrolled 158
;
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*
FROM WORK.Have AS a
LEFT JOIN (SELECT ID
FROM WORK.HAVE
WHERE matric_label="Graduated") AS b ON a.ID=b.ID
WHERE b.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
This worked too, but can you help me remove who is "enrolled" for all terms.
For example PID=45.
PROC SQL;
CREATE TABLE WORK.RemoveGraduates AS
SELECT a.*, c.IDct, c.MLSum
FROM WORK.Have AS a
LEFT JOIN (SELECT ID FROM WORK.HAVE WHERE matric_label="Graduated") AS b ON a.ID=b.ID
LEFT JOIN (SELECT ID, COUNT(ID) AS IDCt, SUM(CASE WHEN matric_label="Enrolled" THEN 1 ELSE 0 END) AS MLSum FROM WORK.Have GROUP BY ID HAVING Calculated IDCt=Calculated MLSum) AS c ON a.ID=c.ID
WHERE b.ID=.
AND c.ID=.
AND a.matric_label="Enrolled"
ORDER BY a.ID, a.Term_Code DESC, a.Term;
QUIT;
DATA WORK.Want;
SET WORK.RemoveGraduates;
BY ID DESCENDING Term_Code TERM;
IF FIRST.ID AND FIRST.Term_Code AND FIRST.Term THEN OUTPUT;
RUN;
Thanks a lot. This is what I was looking for.
Much appreciate your help.
Glad I could help.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.