BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I am trying to sum the number of Course Hours for each StudentUID in the program that they are enrolled. The problem is that there are duplicate records for the grouped variables of StudentUID, Initial Program, Level and CourseName. I tried to include "First.CourseName" in the case when statement but got an error message. I need to only have unique records to be selected and then sum those but I can't seem to get what I need. Maybe there is a better way to get the Program Hours. Thanks.

proc sql noprint;
	create table List109G50 as  
		select  
   		*,
		sum(select 
				B.ProgHours
			from ((select 	
						List109G4.CourseName,
						(case when not missing(List109G4.'Course Hours'n) 
							then List109G4.'Course Hours'n 
							else . 
							end) format=8.2 as ProgHours
      				from work.List109G4 B
	  				group by List109G4.StudentUID, List109G4.'Initial Program'n,List109G4.Level,List109G4.CourseName) B)) format=8.2 as 'Program Hours'n
      from work.List109G4
	  group by List109G4.StudentUID, List109G4.'Initial Program'n,List109G4.Level
	  order by List109G4.StudentUID, List109G4.'Initial Program'n;
quit;


LOG:
28         GOPTIONS ACCESSIBLE;


29         proc sql noprint;
30         	create table List109G50 as
31         		select
32            		*,
33         		sum(select
34         				B.ProgHours
               _
               22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, 
              CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.  

35         			from ((select 	
              ____
              22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, ), *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, 
              CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.  

36         						List109G4.CourseName,
37         						(case when not missing(List109G4.'Course Hours'n)
38         							then List109G4.'Course Hours'n
39         							else .
40         							end) format=8.2 as ProgHours
41               				from work.List109G4 B
42         	  				group by List109G4.StudentUID, List109G4.'Initial
42       ! Program'n,List109G4.Level,List109G4.CourseName) B)) format=8.2 as 'Program Hours'n
                                                           _
                                                           22
2                                                          The SAS System                                09:30 Tuesday, June 8, 2021

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, CONTAINS, 
              EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.  

43               from work.List109G4
44         	  group by List109G4.StudentUID, List109G4.'Initial
44       ! Program'n,List109G4.Level
45         	  order by List109G4.StudentUID, List109G4.'Initial Program'n;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
46         quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Step 1: get rid of those stupid name literals, use SAS names instead. Makes coding so much easier

Step 2: character variables need no formats.

Step 3: use a sub-select that deduplicates on a course level; I use the MAX summary function, bur you may contemplate MEAN if the number of hours changes between semesters

Step 4: sum up the result of step 3 in an outer SELECT.

data List109G4;
input StudentUID :$5. Initial_Program :$20. Level :$10. CourseName :$15. Semester :$10. Course_Hours;
datalines;
1001 AutoMechanics 01 Brakes Fall 3
1001 AutoMechanics 01 Suspension Fall 4
1001 AutoMechanics 01 Electrical Fall 4
1001 AutoMechanics 01 Suspension Winter 4
1001 AutoMechanics 01 Electrical Winter 4
1001 AutoMechanics 01 Suspension Spring 4
1001 AutoMechanics 01 FuelSystem Spring 2
;

proc sql;
create table want as
  select
    studentuid,
    initial_program,
    level,
    sum(course_hours) as course_hours
  from (
    select
      studentuid,
      initial_program,
      level,
      coursename,
      max(course_hours) as course_hours
    from list109g4
    group by studentuid, initial_program, level, coursename
  )
  group by studentuid, initial_program, level
;
quit;

The same can be achieved by running a PROC SORT with NODUPKEY (if hours do not change), and a subsequent PROC SUMMARY:

proc sort
  data=List109G4
  out=dedup
  nodupkey
;
by studentuid initial_program level coursename;
run;

proc summary data=dedup nway;
by studentuid initial_program level;
var course_hours;
output out=want2 (drop=_type_ _freq_) sum()=;
run;

You can use a proc summary to dedup also:

proc summary data=List109G4 nway;
class studentuid initial_program level coursename;
var course_hours;
output out=dedup (drop=_type_ _freq_) max()=;
run;

I used CLASS here because the source dataset is not sorted in strict ascending order, you would need a PROC SORT first to use BY. You can always use CLASS instead of BY as long as the summary data can fit into memory. 

View solution in original post

5 REPLIES 5
Sajid01
Meteorite | Level 14

1.The problem is that there are duplicate records for the grouped variables of StudentUID, Initial Program, Level and CourseName.
Please remove the duplicates

2.The program you have shown has syntax errors.
   Simplify the code. Please test the do test the nested queries first.

Kurt_Bremser
Super User

Please supply an example for the LIST109G4 dataset in usable form (data step with datalines), and what you want to get out of it.

I have an inkling you are overcomplicating things.

twildone
Pyrite | Level 9

Hi Kurt....I have created an example below. StudentUID:1001 is enrolled in the AutoMechanics program. I would like to sum up the Course Hours from each CourseName for the Program. In this example, the Program Hours should be 13. The duplicate courses are in the dataset because some courses run more than one Semester. So Program Hours should equal the sum of the Course Hours from distinct CourseNames. Thanks.

 

 

data List109G4;
input StudentUID :$5. 'Initial Program'n :$20. Level :$10. CourseName :$15. Semester :$10. 'Course Hours'n;
format StudentUID $char5. 'Initial Program'n $char20. Level $char10. CourseName $char15. Semester $char10. 'Course Hours'n;
datalines;
1001 AutoMechanics 01 Brakes Fall 3
1001 AutoMechanics 01 Suspension Fall 4
1001 AutoMechanics 01 Electrical Fall 4
1001 AutoMechanics 01 Suspension Winter 4
1001 AutoMechanics 01 Electrical Winter 4
1001 AutoMechanics 01 Suspension Spring 4
1001 AutoMechanics 01 FuelSystem Spring 2
;
Kurt_Bremser
Super User

Step 1: get rid of those stupid name literals, use SAS names instead. Makes coding so much easier

Step 2: character variables need no formats.

Step 3: use a sub-select that deduplicates on a course level; I use the MAX summary function, bur you may contemplate MEAN if the number of hours changes between semesters

Step 4: sum up the result of step 3 in an outer SELECT.

data List109G4;
input StudentUID :$5. Initial_Program :$20. Level :$10. CourseName :$15. Semester :$10. Course_Hours;
datalines;
1001 AutoMechanics 01 Brakes Fall 3
1001 AutoMechanics 01 Suspension Fall 4
1001 AutoMechanics 01 Electrical Fall 4
1001 AutoMechanics 01 Suspension Winter 4
1001 AutoMechanics 01 Electrical Winter 4
1001 AutoMechanics 01 Suspension Spring 4
1001 AutoMechanics 01 FuelSystem Spring 2
;

proc sql;
create table want as
  select
    studentuid,
    initial_program,
    level,
    sum(course_hours) as course_hours
  from (
    select
      studentuid,
      initial_program,
      level,
      coursename,
      max(course_hours) as course_hours
    from list109g4
    group by studentuid, initial_program, level, coursename
  )
  group by studentuid, initial_program, level
;
quit;

The same can be achieved by running a PROC SORT with NODUPKEY (if hours do not change), and a subsequent PROC SUMMARY:

proc sort
  data=List109G4
  out=dedup
  nodupkey
;
by studentuid initial_program level coursename;
run;

proc summary data=dedup nway;
by studentuid initial_program level;
var course_hours;
output out=want2 (drop=_type_ _freq_) sum()=;
run;

You can use a proc summary to dedup also:

proc summary data=List109G4 nway;
class studentuid initial_program level coursename;
var course_hours;
output out=dedup (drop=_type_ _freq_) max()=;
run;

I used CLASS here because the source dataset is not sorted in strict ascending order, you would need a PROC SORT first to use BY. You can always use CLASS instead of BY as long as the summary data can fit into memory. 

twildone
Pyrite | Level 9

Hi Kurt.....your suggestion worked like I was hoping and needed to obtained the output I was looking for....Thanks a ton for all your help and suggestions....greatly appreciated.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1010 views
  • 0 likes
  • 3 in conversation