Hi everyone,
I'm having a problem to generate a more efficient code to sum a series of variable of the same type seperately (STMinTenday_grp1-STMinTenday_grp37) by the same group. Below is my original code. I've tried using proc sql combined do loop or data step with array. But neither of my plan worked. Hopefully, someone here could point me to a right direction in doing this. Thanks a lot!
proc sql;
create table temp3.test as
select *,
sum(STMinTenday_grp1)as sum_STMinTenday_grp1, sum(STMinTenday_grp2)as sum_STMinTenday_grp2,
sum(STMinTenday_grp3)as sum_STMinTenday_grp3, sum(STMinTenday_grp4)as sum_STMinTenday_grp4,
sum(STMinTenday_grp5)as sum_STMinTenday_grp5, sum(STMinTenday_grp6)as sum_STMinTenday_grp6,
sum(STMinTenday_grp7)as sum_STMinTenday_grp7, sum(STMinTenday_grp8)as sum_STMinTenday_grp8,
sum(STMinTenday_grp9)as sum_STMinTenday_grp9, sum(STMinTenday_grp10)as sum_STMinTenday_grp10,
sum(STMinTenday_grp11)as sum_STMinTenday_grp11, sum(STMinTenday_grp12)as sum_STMinTenday_grp12,
sum(STMinTenday_grp13)as sum_STMinTenday_grp13, sum(STMinTenday_grp14)as sum_STMinTenday_grp14,
sum(STMinTenday_grp15)as sum_STMinTenday_grp15, sum(STMinTenday_grp16)as sum_STMinTenday_grp16,
sum(STMinTenday_grp17)as sum_STMinTenday_grp17, sum(STMinTenday_grp18)as sum_STMinTenday_grp18,
sum(STMinTenday_grp19)as sum_STMinTenday_grp19, sum(STMinTenday_grp20)as sum_STMinTenday_grp20,
sum(STMinTenday_grp21)as sum_STMinTenday_grp21, sum(STMinTenday_grp22)as sum_STMinTenday_grp22,
sum(STMinTenday_grp23)as sum_STMinTenday_grp23, sum(STMinTenday_grp24)as sum_STMinTenday_grp24,
sum(STMinTenday_grp25)as sum_STMinTenday_grp25, sum(STMinTenday_grp26)as sum_STMinTenday_grp26,
sum(STMinTenday_grp27)as sum_STMinTenday_grp27, sum(STMinTenday_grp28)as sum_STMinTenday_grp28,
sum(STMinTenday_grp29)as sum_STMinTenday_grp29, sum(STMinTenday_grp30)as sum_STMinTenday_grp30,
sum(STMinTenday_grp31)as sum_STMinTenday_grp31, sum(STMinTenday_grp32)as sum_STMinTenday_grp32,
sum(STMinTenday_grp33)as sum_STMinTenday_grp33, sum(STMinTenday_grp34)as sum_STMinTenday_grp34,
sum(STMinTenday_grp35)as sum_STMinTenday_grp35, sum(STMinTenday_grp36)as sum_STMinTenday_grp36,
sum(STMinTenday_grp37)as sum_STMinTenday_grp37
from temp3.try10day
group by bene_id;
quit;
You can use PROC MEANS and put the group variable on a CLASS statement. The output variables will have the variables
GROUP var1_Sum var2_Sum var3_sum...
Here's an example:
proc means data=sashelp.class noprint;
class sex;
vars height weight age;
output out=out sum= / autoname;
run;
If you need the group levels to be part of the variable name, you'll have to run an additional step. Let us know if this looks like something you want to pursue.
What's the next step? Maybe it can be incorporated into proc means or the data step solution.
If you are going to merge the statistics back into the original data set, then I agree that sorting and using a BY statement (and NWAY option) is worthwhile:
proc sort data=sashelp.class out=have;
by sex;
run;
proc means data=have noprint nway;
by sex;
vars height weight age;
output out=out sum= / autoname;
run;
data want;
merge have out;
by sex;
run;
Your post mentions that you tried a DATA step. In that case, your data must be in sorted order by BENE_ID.
Under those conditions, here is an approach you could try:
data want;
array previous {37} STMinTenDay_grp1 - STMinTenDay_grp37;
array totals {37} sum_STMinTenDay_grp1 - sum_STMinTenDay_grp37;
do _n_=1 to 37;
totals{_n_}=0;
end;
do until (last.bene_id);
set temp3.try10day;
by bene_id;
do _n_=1 to 37;
totals{_n_} + previous{_n_};
end;
end;
*** Totals have been calculated, now apply them to each record;
do until (last.bene_id);
set temp3.try10day;
by bene_id;
*** Do some calculations here;
output;
end;
run;
Also note, just in case you want to stick with PROC SUMMARY ... when your data set is already sorted, the BY statement is often faster than the CLASS statement. (When you only have a few records per BENE_ID, the reverse can be true making CLASS faster than BY.) If you do use the CLASS statement, you may need to add the word NWAY to the end of the PROC statement to avoid calculating an extra observation holding grand totals across all values of BENE_ID.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.