BookmarkSubscribeRSS Feed
cneed
Obsidian | Level 7

Hello, I am trying to prepare a data set that includes the population of students in my file Data.master_After2001:

 

data.master_after2001.PNG

 

I am looking to calculate each students GPA. GPA is calculated as the sum of grades/number of courses and grades can be summed by assigning A=4, B=3, C=2, D=1, and F=0. Each student can have more then one grade. How can I calculate the GPA for each student then join it into the table above?

 

Data set Data.grades:

Data.grades.PNG

 

Thanks!

14 REPLIES 14
sustagens
Pyrite | Level 9
data master_after2001;
input studentID $ birthdate :DATE9. gender $ schoolID schoolname $;
format birthdate date9.;
datalines;
11931 11APR2001 M 1001 Field
11421 13FEB2001 M 1001 Field
12411 20OCT2002 F 1001 Field
10871 12AUG2002 F 1001 Field
12581 13JAN2003 F 1001 Field
10851 24AUG2002 F 1001 Field
10001 15JUL2004 M 1001 Field
;
run;
data grades; input studentID $ courseID $ grade $ ; datalines; 10001 113 F 10001 139 A 10001 118 B 10002 183 B 10002 192 F 10002 148 D 10002 198 D 10002 151 C ; run; proc format library=work; invalue grade 'A'=4 'B'=3 'C'=2 'D'=1 'F'=0 ; run; proc sql; create table gpa as select t1.*, t2.GPA from master_after2001 t1 left join ( select studentID, sum(input(grade,grade.)) as GPA from grades group by studentID ) t2 on t1.studentID = t2.studentID ; quit;

I added another row to master_after2001 so that you can see the results if there are IDs that match.

cneed
Obsidian | Level 7

@sustagensthis would be awesome if I only had a few students, however, I have a few thousand students in my file (I just sent a screenshot of a few to get the layout of the data). It there a way to make it so I don't have to type out every row?

cneed
Obsidian | Level 7

I have a few thousand grades as well.

Patrick
Opal | Level 21

@cneed wrote:

@sustagensthis would be awesome if I only had a few students, however, I have a few thousand students in my file (I just sent a screenshot of a few to get the layout of the data). It there a way to make it so I don't have to type out every row?


???? Don't you have this data already in SAS data sets?

The data steps @sustagens posted are only creating sample data (as you haven't provided these in a usable form) to allow providing fully tested solution code.

cneed
Obsidian | Level 7

@Patrick  and @sustagens  thanks for the input! I have attached the data for reference.

cneed
Obsidian | Level 7

@Patrick  and @sustagens  here is the second file. It didnt attach to the first.

Patrick
Opal | Level 21

Don't you also have a grades data set? Just use these two data sets in the SQL as posted by @sustagens 

What happens if you do so and execute the code?

sustagens
Pyrite | Level 9

Hi @cneed, you dont have to type your data out row by row. I just created a sample dataset, as @Patrick mentioned, based on your screenshots so I can write up code for it. 

 

In your case you can do without with the first two blocks, because your data already exists as SAS datasets.

 

This will be what you would use:

 

proc format library=work;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0
;
run;

proc sql;
create table gpa as
select t1.*, t2.GPA 
from Data.master_After2001 t1 
left join ( select studentID, sum(input(grade,grade.)) as GPA
			from Data.grades
			group by studentID ) t2 
on t1.studentID = t2.studentID
;
quit;
cneed
Obsidian | Level 7

@sustagens  Thanks! It came out great! The only thing is the the GPA is calculated by the sum divided by the number of courses taken (number of grades provided). How can I incorporate this in?

sustagens
Pyrite | Level 9

GPA would then be:

(sum(input(grade,grade.))/count(courseID)) as GPA

 

Instead of just:

sum(input(grade,grade.)) as GPA
cneed
Obsidian | Level 7

@sustagens  I am now showing blanks in the GPA column. I am getting a few notes that are saying t1.studentID = t2.studentID is invalid:

 

Error log:

 

 

82 proc sql;
83 create table Data.gpa as
84 select t1.*, t2.GPA
85 from Data.master_after2001 as t1
86 left join ( select studentID, (sum(input(grade,grade.))/count(courseID)) as GPA
87 from Data.grades
88 group by studentID ) as t2
89 on t1.studentID = t2.studentID;
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
NOTE: Table DATA.GPA created, with 3427 rows and 6 columns.
 
90 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.11 seconds
cpu time 1.19 seconds
 
 

 

 

Here is my code:

 

 

proc format library=Data;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0;
run;

proc sql;
create table Data.gpa as
select t1.*, t2.GPA
from Data.master_after2001 as t1
left join ( select studentID, (sum(input(grade,grade.))/count(courseID)) as GPA
from Data.grades
group by studentID ) as t2
on t1.studentID = t2.studentID;
quit;

andreas_lds
Jade | Level 19
Is the variable grades non-missing in each obs? Adding other=. To the informat should help
cneed
Obsidian | Level 7

@andreas_lds  Yes, all of the GPAs are missing

 

gpa.PNG

 

 

Do you mean adding other =. instead of library Data like this?

 

proc format library other=.;
invalue grade
'A'=4
'B'=3
'C'=2
'D'=1
'F'=0;
run;

proc sql;
create table Data.gpa as
select t1.*, t2.GPA
from Data.master_after2001 t1
left join ( select studentID, (sum(input(grade,grade.))/count(courseID)) as GPA
from Data.grades
group by studentID ) t2
on t1.studentID = t2.studentID;
quit;

sustagens
Pyrite | Level 9

Hi @cneed, looking at the actual datasets you attached, not all of the studentIDs are in both tables. You have 55 students that are in master_after2001 but not in grades. Because we are looking up (left join) the grades based on all the studentIDs in the master table, any studentID it can't find in grades will return null. 

 

You may run this query to view the 55 rows:

 

PROC SQL;
   CREATE TABLE studentIDs_not_in_grades_table AS 
   SELECT t1.studentID, 
          t1.birthdate, 
          t1.gender, 
          t1.schoolID, 
          t1.schoolname
      FROM Data.master_after2001 t1
      WHERE (t1.studentID NOT IN (select distinct studentID from Data.grades))
      ORDER BY t1.studentID;
QUIT;

 

You may follow @andreas_lds' advice to add other=. to the informat.

Otherwise if you only want to keep studentIDs that exist in both tables, switch to an inner join instead of a left join.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1238 views
  • 4 likes
  • 4 in conversation