Hello, I am trying to prepare a data set that includes the population of students in my file Data.master_After2001:
I am looking to calculate each students GPA. GPA is calculated as the sum of grades/number of courses and grades can be summed by assigning A=4, B=3, C=2, D=1, and F=0. Each student can have more then one grade. How can I calculate the GPA for each student then join it into the table above?
Data set Data.grades:
Thanks!
data master_after2001;
input studentID $ birthdate :DATE9. gender $ schoolID schoolname $;
format birthdate date9.;
datalines;
11931 11APR2001 M 1001 Field
11421 13FEB2001 M 1001 Field
12411 20OCT2002 F 1001 Field
10871 12AUG2002 F 1001 Field
12581 13JAN2003 F 1001 Field
10851 24AUG2002 F 1001 Field
10001 15JUL2004 M 1001 Field
;
run;
data grades;
input studentID $ courseID $ grade $ ;
datalines;
10001 113 F
10001 139 A
10001 118 B
10002 183 B
10002 192 F
10002 148 D
10002 198 D
10002 151 C
;
run;
proc format library=work;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0
;
run;
proc sql;
create table gpa as
select t1.*, t2.GPA
from master_after2001 t1
left join ( select studentID, sum(input(grade,grade.)) as GPA
from grades
group by studentID ) t2
on t1.studentID = t2.studentID
;
quit;
I added another row to master_after2001 so that you can see the results if there are IDs that match.
@sustagensthis would be awesome if I only had a few students, however, I have a few thousand students in my file (I just sent a screenshot of a few to get the layout of the data). It there a way to make it so I don't have to type out every row?
I have a few thousand grades as well.
@cneed wrote:
@sustagensthis would be awesome if I only had a few students, however, I have a few thousand students in my file (I just sent a screenshot of a few to get the layout of the data). It there a way to make it so I don't have to type out every row?
???? Don't you have this data already in SAS data sets?
The data steps @sustagens posted are only creating sample data (as you haven't provided these in a usable form) to allow providing fully tested solution code.
@Patrick and @sustagens thanks for the input! I have attached the data for reference.
@Patrick and @sustagens here is the second file. It didnt attach to the first.
Don't you also have a grades data set? Just use these two data sets in the SQL as posted by @sustagens
What happens if you do so and execute the code?
Hi @cneed, you dont have to type your data out row by row. I just created a sample dataset, as @Patrick mentioned, based on your screenshots so I can write up code for it.
In your case you can do without with the first two blocks, because your data already exists as SAS datasets.
This will be what you would use:
proc format library=work;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0
;
run;
proc sql;
create table gpa as
select t1.*, t2.GPA
from Data.master_After2001 t1
left join ( select studentID, sum(input(grade,grade.)) as GPA
from Data.grades
group by studentID ) t2
on t1.studentID = t2.studentID
;
quit;
@sustagens Thanks! It came out great! The only thing is the the GPA is calculated by the sum divided by the number of courses taken (number of grades provided). How can I incorporate this in?
GPA would then be:
(sum(input(grade,grade.))/count(courseID)) as GPA
Instead of just:
sum(input(grade,grade.)) as GPA
@sustagens I am now showing blanks in the GPA column. I am getting a few notes that are saying t1.studentID = t2.studentID is invalid:
Error log:
Here is my code:
proc format library=Data;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0;
run;
proc sql;
create table Data.gpa as
select t1.*, t2.GPA
from Data.master_after2001 as t1
left join ( select studentID, (sum(input(grade,grade.))/count(courseID)) as GPA
from Data.grades
group by studentID ) as t2
on t1.studentID = t2.studentID;
quit;
@andreas_lds Yes, all of the GPAs are missing
Do you mean adding other =. instead of library Data like this?
proc format library other=.;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0;
run;
proc sql;
create table Data.gpa as
select t1.*, t2.GPA
from Data.master_after2001 t1
left join ( select studentID, (sum(input(grade,grade.))/count(courseID)) as GPA
from Data.grades
group by studentID ) t2
on t1.studentID = t2.studentID;
quit;
Hi @cneed, looking at the actual datasets you attached, not all of the studentIDs are in both tables. You have 55 students that are in master_after2001 but not in grades. Because we are looking up (left join) the grades based on all the studentIDs in the master table, any studentID it can't find in grades will return null.
You may run this query to view the 55 rows:
PROC SQL;
CREATE TABLE studentIDs_not_in_grades_table AS
SELECT t1.studentID,
t1.birthdate,
t1.gender,
t1.schoolID,
t1.schoolname
FROM Data.master_after2001 t1
WHERE (t1.studentID NOT IN (select distinct studentID from Data.grades))
ORDER BY t1.studentID;
QUIT;
You may follow @andreas_lds' advice to add other=. to the informat.
Otherwise if you only want to keep studentIDs that exist in both tables, switch to an inner join instead of a left join.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.