Solved: Sum of Rows Joined with Other Table

cneed · Posted 01-21-2020 04:20 PM

Hello, in question 3 I am trying to left join the data on student ID. In the Data.Grades table students can have more then 1 grade listed separately in several rows. When I join it to the Data.master_After2001 data set I am looking for the sum of the grades for each student. How can I accomplish this? My code right now is erroring on "(SUM(D.Grade)) AS Sum_of_Grades". I have provided all of my code for reference on how I got to question 3.

/*Question 1:*/
PROC SQL;
CREATE TABLE Data.master_WithSchoolInfo AS
SELECT A.studentID, A.Birthdate, A.Gender, B.SchoolID, B.SchoolName
FROM Data.master A LEFT JOIN Data.school B
ON A.SchoolID = B.SchoolID;
RUN;

PROC CONTENTS DATA = Data.master_WithSchoolInfo;
RUN;

/*Question 2:*/

PROC SQL;
CREATE TABLE Data.master_After2001 AS
SELECT *
FROM Data.master_WithSchoolInfo
WHERE Birthdate > '01Jan2001'd;
RUN;

PROC CONTENTS DATA = Data.master_After2001;
RUN;

/*Question 3:*/

PROC SQL;
CREATE TABLE Data.master_After2001WithGrades AS
SELECT C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName, D.studentID, (SUM(D.Grade)) AS Sum_of_Grades
FROM Data.master_After2001 C Left Join Data.Grades D
ON C.studentID = D.studentID
GROUP BY D.Grade;
RUN;

PROC CONTENTS DATA = Data.master_After2001WithGrades;
RUN;

cneed · Posted 01-21-2020 05:14 PM

Still showing the error "the SUM summary function requires a numeric argument." I am trying to join the data.grades into the data.master_After2001 table, however, a student in the data file could have more then 1 grade. I am just looking for the amount of grades a student has to be entered into a new row in the Data.master_After2001 file.

Data.master_After2001:

Data.grades:

View solution in original post

PaigeMiller · Posted 01-21-2020 04:27 PM

It would be helpful if you showed us the LOG that gives the error (the code as seen in the log AND the errors as seen in the log).

It would be helpful if you showed us a portion of your data.

As a wild guess, if you have SUM(D.GRADE) and GROUP BY (D.GRADE), I'm thinking that might be a problem.

--
Paige Miller

cneed · Posted 01-21-2020 04:31 PM

Here Is what the error log is showing:

75 SELECT C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName, D.studentID, (SUM(D.Grade)) AS Sum_of_Grades

76 FROM Data.master_After2001 C Left Join Data.Grades D

77 ON C.studentID = D.studentID;

ERROR: The SUM summary function requires a numeric argument.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

I tried removing the group by but still no luck.

cneed · Posted 01-21-2020 04:54 PM

Also, here is a picture of the Data.grades table.

PGStats · Posted 01-21-2020 05:04 PM

Should be

PROC SQL;
CREATE TABLE Data.master_After2001WithGrades AS
SELECT 
	C.studentID, 
	C.Birthdate, 
	C.Gender, 
	C.SchoolID, 
	C.SchoolName, 
	SUM(D.Grade) AS Sum_of_Grades
FROM 
	Data.master_After2001 as C Left Join 
	Data.Grades as D
ON C.studentID = D.studentID
GROUP BY C.studentID, C.Birthdate, C.Gender, C.SchoolID, C.SchoolName;
QUIT;

(unntested)

PG

cneed · Posted 01-21-2020 05:14 PM

Still showing the error "the SUM summary function requires a numeric argument." I am trying to join the data.grades into the data.master_After2001 table, however, a student in the data file could have more then 1 grade. I am just looking for the amount of grades a student has to be entered into a new row in the Data.master_After2001 file.

Data.master_After2001:

Data.grades:

PaigeMiller · Posted 01-21-2020 05:42 PM

Aha!

You are trying to sum letters A B C D F. Do you see why it doesn't work?

--
Paige Miller

hashman · Posted 01-21-2020 06:49 PM

@cneed:

As a simple remedy, you could just insert DISTINCT between SELECT and C.studentID, and you would not have the dupes. The query offered by @PGStats results in the same (in his query, though, SUM should be replaced by COUNT, or else it won't work). However, neither is efficient. It makes more sense to aggregate the Grades file first, which will collapse it to unique StudentID key-values, and then join the result with Master:

proc sql ;                                                                                                                                       
  create table data.master_after2001withgrades as                                                                                                
  select m.studentid                                                                                                                             
       , m.birthdate                                                                                                                             
       , m.gender                                                                                                                                
       , m.schoolid                                                                                                                              
       , m.schoolname                                                                                                                            
       , g.grade_count                                                                                                                           
  from   data.master_after2001 c                                                                                                                 
    left join                                                                                                                                    
        (select studentid                                                                                                                        
              , count (grade) as grade_count                                                                                                     
         from   grades group studentid                                                                                                           
         ) g                                                                                                                                     
  on     m.studentid = g.studentid                                                                                                               
  ;                                                                                                                                              
quit ;

Alternatively, it can be achieved by applying the same concept by first aggregating Grades via a hash table:

data want (drop = grade) ;                                                                                                                       
  if _n_ = 1 then do ;                                                                                                                           
    dcl hash h () ;                                                                                                                              
    h.definekey ("studentid") ;                                                                                                                  
    h.definedata ("grade_count") ;                                                                                                               
    h.definedone () ;                                                                                                                            
    do until (z) ;                                                                                                                               
      set grades end = z ;                                                                                                                       
      if h.find() ne 0 then grade_count = 1 ;                                                                                                    
      else                  grade_count + 1 ;                                                                                                    
      h.replace() ;                                                                                                                              
    end ;                                                                                                                                        
  end ;                                                                                                                                          
  set master (keep = studentid birthdate gender schoolid schoolname) ;                                                                           
  if h.find() ne 0 then call missing (grade_count) ;                                                                                             
run ;

Neither SQL nor hash require the input data sets to be sorted by StudentID. However, if they are already sorted, the easiest and fastest way of getting what you want is a simple merge:

data want (drop = grade) ;                                                                                                                       
  do until (last.studentid) ;                                                                                                                    
    merge master (keep = studentid birthdate gender schoolid schoolname) grades (in = g) ;                                                       
    by studentid ;                                                                                                                               
    if g then grade_count = sum (grade_count, 1) ;                                                                                               
  end ;                                                                                                                                          
run ;

Of course, there're other ways of achieving the same result using more than a single step.

Kind regards

Paul D.

PaigeMiller · Posted 01-22-2020 06:54 AM

The query offered by @PGStats results in the same (in his query, though, SUM should be replaced by COUNT, or else it won't work)

I don't see how COUNT works any better than SUM, as it does not compute frequency of A and frequency of B, etc. if that's what is being requested — clearly SUM makes no sense here. But @cneed really needs to state clearly the output he wants.

--
Paige Miller

hashman · Posted 01-22-2020 09:44 AM

@PaigeMiller:

Grade is a character column here, hence SUM would generate:

ERROR: The SUM summary function requires a numeric argument.

SUM (grade ne "") could be used but why when COUNT will do. On the other note, it does not follow from the OP's description that the count distinct is requested rather than the straight count. If so, it's easy to change, whether it's SQL, hash, or merge.

Kind regards

Paul D.

Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

Re: Sum of Rows Joined with Other Table

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away