So I have this simple fictive grades of some fictive students:
data have;
input id course $ grade credit;
cards;
1 test1 6 5
1 test1 3 0
1 test1 4 0
1 test2 5 0
1 test2 7 5
1 test2 4 0
2 test1 6 5
2 test1 7 5
2 test1 8 5
3 test1 7 5
3 test1 3 0
3 test1 1 0
;
Which, when run in SAS, outputs this table:
But I want this table as output:
Can someone help me out? Thanks in advance!
I would use either proc summary + merge or one data step with dow-loops:
data want1;
if 0 then set have;
length min max earned_credits num_tests 8;
min = 99999;
do _n_ = 1 by 1 until (last.course);
set have;
by id course;
min = min(min, grade);
max = max(max, grade);
num_tests = sum(num_tests, 1);
earned_credits = sum(earned_credits, credit);
end;
do _n_ = 1 by 1 until (last.course);
set have;
by id course;
output;
end;
run;
Depending on the number of observations in your dataset, this will run faster than sql while using less memory.
Hello @SAS_Question,
PROC SQL could be used to compute those summary statistics and it would automatically "remerge" them back to the observations from dataset HAVE, as you want. The only minor difficulty might be to ensure that the order of observations within an ID-COURSE combination in the output dataset is the same as in HAVE. If this sort order doesn't matter, you can use:
proc sql;
create table want as
select *, min(grade) as min, max(grade) as max, sum(credit) as earned_credits, n(grade) as n_tests
from have
group by id, course;
quit;
(I have abbreviated the variable name how_many_times_made_the_test a bit.)
If that sort order is an issue, create a sequential number _seqno in a preliminary DATA step, insert an ORDER BY clause into the above PROC SQL step
order by _seqno;
and drop the number via a dataset option: ... want(drop=_seqno) ... See Re: How can I keep the order of rows the same after a left join? for a complete code example.
(Or rely on the undocumented MONOTONIC() function: order by monotonic(id); -- not recommended).
Alternatively, use PROC SUMMARY and do the remerging in a DATA step. If dataset HAVE is sorted by ID COURSE (as in your sample data), you can use a BY statement in both steps:
proc summary data=have;
by id course;
output out=stats(drop=_:) min(grade)=min max(grade)=max sum(credit)=earned_credits n(grade)=n_tests;
run;
data want;
merge have stats;
by id course;
run;
Edit: Unlike PROC SQL, the SUMMARY procedure has a VAR statement to specify analysis variables. You can use this to shorten the OUTPUT statement a little:
var grade; output out=stats(drop=_:) min=min max=max sum(credit)=earned_credits n=n_tests;
I would use either proc summary + merge or one data step with dow-loops:
data want1;
if 0 then set have;
length min max earned_credits num_tests 8;
min = 99999;
do _n_ = 1 by 1 until (last.course);
set have;
by id course;
min = min(min, grade);
max = max(max, grade);
num_tests = sum(num_tests, 1);
earned_credits = sum(earned_credits, credit);
end;
do _n_ = 1 by 1 until (last.course);
set have;
by id course;
output;
end;
run;
Depending on the number of observations in your dataset, this will run faster than sql while using less memory.
@FreelanceReinhard and @andreas_lds you guys both deserve my respect for your solutions and for your help!! Thank you both!
But I have to go with @andreas_lds solution becayse that one is pure SAS-code. I like that very much. I know SQL is powerfull but I like the simplicity and power of the SAS-code solutions ! Very neat! Thank you both!! Really appriciate the help of both of you guys!!!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.