BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jrsm21
Calcite | Level 5

I am new to SAS and have been working on some exercises, but don't know if my code is correct. I have three data sets and need to combine all three into one and then create a new one with only unique records. 

 

 

Here are the variable definitions and my code:

SAS Dataset Name: STUDENTS
Column Name	Type	Format	
First_name	Text	$30.	
Last_name	Text	$30.	
DOB		Text	mmddyy10.
StudentID	Text	$5.	

SAS Dataset Name: COURSES
Column Name	Type	Format	
StudentID	Text	$5.	
CourseID	Text	$8.	
Course_name	Text	$30.
Begin_date	Number	mmddyy10.
End_date	Number	mmddyy10.
TestID	        Text	$10.	

SAS Dataset Name: GRADES
Column Name	Type	Format	
TestID		Text	$10.	
Test_type	Text	$5.	
Grade		Number  +3.	
Grade_date	Number	Mmddyy10.	
	

// Put all results from the three tables into one table

PROC SQL;
CREATE TABLE All_Grades AS 
      SELECT Students.*, Courses.*, Grades.* 
      FROM Students, Courses, Grades
      WHERE Students.StudentID = Courses.StudentID AND Courses.TestID = Grades.TestID;
quit;

// From the previous combined table, select unique student IDs with their the earliest test grade expressed as a year

PROC SQL; 
     CREATE TABLE Earliest_Test AS 
     SELECT DISTINCT StudentID, TestID, MIN(Test_date) num AS test_year format=YEAR4
     FROM All_Grades
     ORDER BY StudentID ; 
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

To me, it looks like that last step should be

 

PROC SQL; 
     CREATE TABLE Earliest_Grade AS 
     SELECT StudentID, TestID, grade, year(Grade_date) AS grade_year 
     FROM All_Grades
     GROUP BY StudentID
     HAVING grade_date = min(grade_date); 
quit;

and the comments syntax should be converted into proper SAS.

PG

View solution in original post

6 REPLIES 6
Reeza
Super User

So what is your question? Do you have some reason to believe the code is incorrect? Are you getting errors or issues with the results?

 

Here's some improvement suggestions but I suspect your answer is fine. 

 

  1. Listing tables separated via a comma is a cross join, which is filtered via the WHERE. You may want to explicitly define the type of joins you want instead of letting SAS figure it out. Your log may indicate that as well - check if there's a note in it. 
  2. Check the number of records and determine how many should have at the end and verify that's what you're getting. I suspect you need as many records as in GRADES in your final data set but that could depend on the logic of how the data is collected, which I do not know. 
  3. I think format=year4 should have a period at the end to define the format, but if it's causing issues I'm wrong. 

 

Given lack of data or context, I'm guessing a lot here. 

 


@jrsm21 wrote:

I am new to SAS and have been working on some exercises, but don't know if my code is correct. I have three data sets and need to combine all three into one and then create a new one with only unique records. 

 

 

Here are the variable definitions and my code:

SAS Dataset Name: STUDENTS
Column Name	Type	Format	
First_name	Text	$30.	
Last_name	Text	$30.	
DOB		Text	mmddyy10.
StudentID	Text	$5.	

SAS Dataset Name: COURSES
Column Name	Type	Format	
StudentID	Text	$5.	
CourseID	Text	$8.	
Course_name	Text	$30.
Begin_date	Number	mmddyy10.
End_date	Number	mmddyy10.
TestID	        Text	$10.	

SAS Dataset Name: GRADES
Column Name	Type	Format	
TestID		Text	$10.	
Test_type	Text	$5.	
Grade		Number  +3.	
Grade_date	Number	Mmddyy10.	
	

// Put all results from the three tables into one table

PROC SQL;
CREATE TABLE All_Grades AS 
      SELECT Students.*, Courses.*, Grades.* 
      FROM Students, Courses, Grades
      WHERE Students.StudentID = Courses.StudentID AND Courses.TestID = Grades.TestID;
quit;

// From the previous combined table, select unique student IDs with their the earliest test grade expressed as a year

PROC SQL; 
     CREATE TABLE Earliest_Test AS 
     SELECT DISTINCT StudentID, TestID, MIN(Test_date) num AS test_year format=YEAR4
     FROM All_Grades
     ORDER BY StudentID ; 
quit;

 

PGStats
Opal | Level 21

To me, it looks like that last step should be

 

PROC SQL; 
     CREATE TABLE Earliest_Grade AS 
     SELECT StudentID, TestID, grade, year(Grade_date) AS grade_year 
     FROM All_Grades
     GROUP BY StudentID
     HAVING grade_date = min(grade_date); 
quit;

and the comments syntax should be converted into proper SAS.

PG
jrsm21
Calcite | Level 5

Thanks!

Reeza
Super User
'unique' records is ambiguous in this situation. Is a unique record one per test, one per course or one per student?
jrsm21
Calcite | Level 5

Only one course record for each student. It should be the test record with the earliest test_date.
Only the variables StudentID, CourseID, Test_date, and Grade should be included in Earliest_Test data table. The test_year variable should contain the test_date year.

Reeza
Super User
Regardless of test type?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 994 views
  • 0 likes
  • 3 in conversation