Hi....I am trying to sum the number of Course Hours for each StudentUID in the program that they are enrolled. The problem is that there are duplicate records for the grouped variables of StudentUID, Initial Program, Level and CourseName. I tried to include "First.CourseName" in the case when statement but got an error message. I need to only have unique records to be selected and then sum those but I can't seem to get what I need. Maybe there is a better way to get the Program Hours. Thanks.
proc sql noprint; create table List109G50 as select *, sum(select B.ProgHours from ((select List109G4.CourseName, (case when not missing(List109G4.'Course Hours'n) then List109G4.'Course Hours'n else . end) format=8.2 as ProgHours from work.List109G4 B group by List109G4.StudentUID, List109G4.'Initial Program'n,List109G4.Level,List109G4.CourseName) B)) format=8.2 as 'Program Hours'n from work.List109G4 group by List109G4.StudentUID, List109G4.'Initial Program'n,List109G4.Level order by List109G4.StudentUID, List109G4.'Initial Program'n; quit; LOG: 28 GOPTIONS ACCESSIBLE; 29 proc sql noprint; 30 create table List109G50 as 31 select 32 *, 33 sum(select 34 B.ProgHours _ 22 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=. 35 from ((select ____ 22 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, ), *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=. 36 List109G4.CourseName, 37 (case when not missing(List109G4.'Course Hours'n) 38 then List109G4.'Course Hours'n 39 else . 40 end) format=8.2 as ProgHours 41 from work.List109G4 B 42 group by List109G4.StudentUID, List109G4.'Initial 42 ! Program'n,List109G4.Level,List109G4.CourseName) B)) format=8.2 as 'Program Hours'n _ 22 2 The SAS System 09:30 Tuesday, June 8, 2021 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=. 43 from work.List109G4 44 group by List109G4.StudentUID, List109G4.'Initial 44 ! Program'n,List109G4.Level 45 order by List109G4.StudentUID, List109G4.'Initial Program'n; NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 46 quit;
Step 1: get rid of those stupid name literals, use SAS names instead. Makes coding so much easier
Step 2: character variables need no formats.
Step 3: use a sub-select that deduplicates on a course level; I use the MAX summary function, bur you may contemplate MEAN if the number of hours changes between semesters
Step 4: sum up the result of step 3 in an outer SELECT.
data List109G4;
input StudentUID :$5. Initial_Program :$20. Level :$10. CourseName :$15. Semester :$10. Course_Hours;
datalines;
1001 AutoMechanics 01 Brakes Fall 3
1001 AutoMechanics 01 Suspension Fall 4
1001 AutoMechanics 01 Electrical Fall 4
1001 AutoMechanics 01 Suspension Winter 4
1001 AutoMechanics 01 Electrical Winter 4
1001 AutoMechanics 01 Suspension Spring 4
1001 AutoMechanics 01 FuelSystem Spring 2
;
proc sql;
create table want as
select
studentuid,
initial_program,
level,
sum(course_hours) as course_hours
from (
select
studentuid,
initial_program,
level,
coursename,
max(course_hours) as course_hours
from list109g4
group by studentuid, initial_program, level, coursename
)
group by studentuid, initial_program, level
;
quit;
The same can be achieved by running a PROC SORT with NODUPKEY (if hours do not change), and a subsequent PROC SUMMARY:
proc sort
data=List109G4
out=dedup
nodupkey
;
by studentuid initial_program level coursename;
run;
proc summary data=dedup nway;
by studentuid initial_program level;
var course_hours;
output out=want2 (drop=_type_ _freq_) sum()=;
run;
You can use a proc summary to dedup also:
proc summary data=List109G4 nway;
class studentuid initial_program level coursename;
var course_hours;
output out=dedup (drop=_type_ _freq_) max()=;
run;
I used CLASS here because the source dataset is not sorted in strict ascending order, you would need a PROC SORT first to use BY. You can always use CLASS instead of BY as long as the summary data can fit into memory.
1.The problem is that there are duplicate records for the grouped variables of StudentUID, Initial Program, Level and CourseName.
Please remove the duplicates
2.The program you have shown has syntax errors.
Simplify the code. Please test the do test the nested queries first.
Please supply an example for the LIST109G4 dataset in usable form (data step with datalines), and what you want to get out of it.
I have an inkling you are overcomplicating things.
Hi Kurt....I have created an example below. StudentUID:1001 is enrolled in the AutoMechanics program. I would like to sum up the Course Hours from each CourseName for the Program. In this example, the Program Hours should be 13. The duplicate courses are in the dataset because some courses run more than one Semester. So Program Hours should equal the sum of the Course Hours from distinct CourseNames. Thanks.
data List109G4; input StudentUID :$5. 'Initial Program'n :$20. Level :$10. CourseName :$15. Semester :$10. 'Course Hours'n; format StudentUID $char5. 'Initial Program'n $char20. Level $char10. CourseName $char15. Semester $char10. 'Course Hours'n; datalines; 1001 AutoMechanics 01 Brakes Fall 3 1001 AutoMechanics 01 Suspension Fall 4 1001 AutoMechanics 01 Electrical Fall 4 1001 AutoMechanics 01 Suspension Winter 4 1001 AutoMechanics 01 Electrical Winter 4 1001 AutoMechanics 01 Suspension Spring 4 1001 AutoMechanics 01 FuelSystem Spring 2 ;
Step 1: get rid of those stupid name literals, use SAS names instead. Makes coding so much easier
Step 2: character variables need no formats.
Step 3: use a sub-select that deduplicates on a course level; I use the MAX summary function, bur you may contemplate MEAN if the number of hours changes between semesters
Step 4: sum up the result of step 3 in an outer SELECT.
data List109G4;
input StudentUID :$5. Initial_Program :$20. Level :$10. CourseName :$15. Semester :$10. Course_Hours;
datalines;
1001 AutoMechanics 01 Brakes Fall 3
1001 AutoMechanics 01 Suspension Fall 4
1001 AutoMechanics 01 Electrical Fall 4
1001 AutoMechanics 01 Suspension Winter 4
1001 AutoMechanics 01 Electrical Winter 4
1001 AutoMechanics 01 Suspension Spring 4
1001 AutoMechanics 01 FuelSystem Spring 2
;
proc sql;
create table want as
select
studentuid,
initial_program,
level,
sum(course_hours) as course_hours
from (
select
studentuid,
initial_program,
level,
coursename,
max(course_hours) as course_hours
from list109g4
group by studentuid, initial_program, level, coursename
)
group by studentuid, initial_program, level
;
quit;
The same can be achieved by running a PROC SORT with NODUPKEY (if hours do not change), and a subsequent PROC SUMMARY:
proc sort
data=List109G4
out=dedup
nodupkey
;
by studentuid initial_program level coursename;
run;
proc summary data=dedup nway;
by studentuid initial_program level;
var course_hours;
output out=want2 (drop=_type_ _freq_) sum()=;
run;
You can use a proc summary to dedup also:
proc summary data=List109G4 nway;
class studentuid initial_program level coursename;
var course_hours;
output out=dedup (drop=_type_ _freq_) max()=;
run;
I used CLASS here because the source dataset is not sorted in strict ascending order, you would need a PROC SORT first to use BY. You can always use CLASS instead of BY as long as the summary data can fit into memory.
Hi Kurt.....your suggestion worked like I was hoping and needed to obtained the output I was looking for....Thanks a ton for all your help and suggestions....greatly appreciated.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.