BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cneed
Obsidian | Level 7

Hello, in question 3 I am trying to left join the data on student ID. In the Data.Grades table students can have more then 1 grade listed separately in several rows. When I join it to the Data.master_After2001 data set I am looking for the sum of the grades for each student. How can I accomplish this? My code right now is erroring on "(SUM(D.Grade)) AS Sum_of_Grades". I have provided all of my code for reference on how I got to question 3.

 

 

/*Question 1:*/
PROC SQL;
CREATE TABLE Data.master_WithSchoolInfo AS
SELECT A.studentID, A.Birthdate, A.Gender, B.SchoolID, B.SchoolName
FROM Data.master A LEFT JOIN Data.school B
ON A.SchoolID = B.SchoolID;
RUN;

PROC CONTENTS DATA = Data.master_WithSchoolInfo;
RUN;

 

/*Question 2:*/

PROC SQL;
CREATE TABLE Data.master_After2001 AS
SELECT *
FROM Data.master_WithSchoolInfo
WHERE Birthdate > '01Jan2001'd;
RUN;

PROC CONTENTS DATA = Data.master_After2001;
RUN;


/*Question 3:*/

PROC SQL;
CREATE TABLE Data.master_After2001WithGrades AS
SELECT C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName, D.studentID, (SUM(D.Grade)) AS Sum_of_Grades
FROM Data.master_After2001 C Left Join Data.Grades D
ON C.studentID = D.studentID
GROUP BY D.Grade;
RUN;

PROC CONTENTS DATA = Data.master_After2001WithGrades;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
cneed
Obsidian | Level 7

Still showing the error "the SUM summary function requires a numeric argument." I am trying to join the data.grades into the data.master_After2001 table, however, a student in the data file could have more then 1 grade. I am just looking for the amount of grades a student has to be entered into a new row in the Data.master_After2001 file.

 

Data.master_After2001:

data.master_after2001.PNG

 

Data.grades:

Data.grades.PNG

 

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

It would be helpful if you showed us the LOG that gives the error (the code as seen in the log AND the errors as seen in the log).

 

It would be helpful if you showed us a portion of your data.

 

As a wild guess, if you have SUM(D.GRADE) and GROUP BY (D.GRADE), I'm thinking that might be a problem.

--
Paige Miller
cneed
Obsidian | Level 7

Here Is what the error log is showing:

 

75 SELECT C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName, D.studentID, (SUM(D.Grade)) AS Sum_of_Grades
76 FROM Data.master_After2001 C Left Join Data.Grades D
77 ON C.studentID = D.studentID;
ERROR: The SUM summary function requires a numeric argument.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 
I tried removing the group by but still no luck.
cneed
Obsidian | Level 7

Also, here is a picture of the Data.grades table.

 

Data.grades.PNG

 

PGStats
Opal | Level 21

Should be

 

PROC SQL;
CREATE TABLE Data.master_After2001WithGrades AS
SELECT 
	C.studentID, 
	C.Birthdate, 
	C.Gender, 
	C.SchoolID, 
	C.SchoolName, 
	SUM(D.Grade) AS Sum_of_Grades
FROM 
	Data.master_After2001 as C Left Join 
	Data.Grades as D
ON C.studentID = D.studentID
GROUP BY C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName;
QUIT;

(unntested)

PG
cneed
Obsidian | Level 7

Still showing the error "the SUM summary function requires a numeric argument." I am trying to join the data.grades into the data.master_After2001 table, however, a student in the data file could have more then 1 grade. I am just looking for the amount of grades a student has to be entered into a new row in the Data.master_After2001 file.

 

Data.master_After2001:

data.master_after2001.PNG

 

Data.grades:

Data.grades.PNG

 

PaigeMiller
Diamond | Level 26

Aha!

 

You are trying to sum letters A B C D F. Do you see why it doesn't work?

--
Paige Miller
hashman
Ammonite | Level 13

@cneed:

As a simple remedy, you could just insert DISTINCT between SELECT and C.studentID, and you would not have the dupes. The query offered by @PGStats results in the same (in his query, though, SUM should be replaced by COUNT, or else it won't work). However, neither is efficient. It makes more sense to aggregate the Grades file first,  which will collapse it to unique StudentID key-values, and then join the result with Master:

proc sql ;                                                                                                                                       
  create table data.master_after2001withgrades as                                                                                                
  select m.studentid                                                                                                                             
       , m.birthdate                                                                                                                             
       , m.gender                                                                                                                                
       , m.schoolid                                                                                                                              
       , m.schoolname                                                                                                                            
       , g.grade_count                                                                                                                           
  from   data.master_after2001 c                                                                                                                 
    left join                                                                                                                                    
        (select studentid                                                                                                                        
              , count (grade) as grade_count                                                                                                     
         from   grades group studentid                                                                                                           
         ) g                                                                                                                                     
  on     m.studentid = g.studentid                                                                                                               
  ;                                                                                                                                              
quit ;       

Alternatively, it can be achieved by applying the same concept by first aggregating Grades via a hash table:

data want (drop = grade) ;                                                                                                                       
  if _n_ = 1 then do ;                                                                                                                           
    dcl hash h () ;                                                                                                                              
    h.definekey ("studentid") ;                                                                                                                  
    h.definedata ("grade_count") ;                                                                                                               
    h.definedone () ;                                                                                                                            
    do until (z) ;                                                                                                                               
      set grades end = z ;                                                                                                                       
      if h.find() ne 0 then grade_count = 1 ;                                                                                                    
      else                  grade_count + 1 ;                                                                                                    
      h.replace() ;                                                                                                                              
    end ;                                                                                                                                        
  end ;                                                                                                                                          
  set master (keep = studentid birthdate gender schoolid schoolname) ;                                                                           
  if h.find() ne 0 then call missing (grade_count) ;                                                                                             
run ;            

Neither SQL nor hash require the input data sets to be sorted by StudentID. However, if they are already sorted, the easiest and fastest way of getting what you want is a simple merge:

data want (drop = grade) ;                                                                                                                       
  do until (last.studentid) ;                                                                                                                    
    merge master (keep = studentid birthdate gender schoolid schoolname) grades (in = g) ;                                                       
    by studentid ;                                                                                                                               
    if g then grade_count = sum (grade_count, 1) ;                                                                                               
  end ;                                                                                                                                          
run ;  

Of course, there're other ways of achieving the same result using more than a single step. 

 

Kind regards

Paul D. 

PaigeMiller
Diamond | Level 26

The query offered by @PGStats results in the same (in his query, though, SUM should be replaced by COUNT, or else it won't work)

 

I don't see how COUNT works any better than SUM, as it does not compute frequency of A and frequency of B, etc. if that's what is being requested — clearly SUM makes no sense here. But @cneed really needs to state clearly the output he wants.

--
Paige Miller
hashman
Ammonite | Level 13

@PaigeMiller:

 

Grade is a character column here, hence SUM would generate:

ERROR: The SUM summary function requires a numeric argument.

SUM (grade ne "") could be used but why when COUNT will do. On the other note, it does not follow from the OP's description that the count distinct is requested rather than the straight count. If so, it's easy to change, whether it's SQL, hash, or merge.

 

Kind regards

Paul D. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 870 views
  • 2 likes
  • 4 in conversation