I'm wondering what's the best way to create a variable that contains a column's total sum inside of a data step. Can this be easily achieved without proc sql?
As an example, I'd like to create a new variable, total_age, that contains the sum of all the students' ages, in the sashelp.class dataset. My code below creates a cumulative sum, but how would I revise to create a total so that each row displays the total age? Thank you.
data sas_intro;
set sashelp.class;
retain;
total_age = sum(total_age, age); /*creates cumulative sum, not total)*/
run;
A data step approach
data want;
if _N_ = 1 then do;
do until (z);
set sashelp.class end = z;
total_age = sum(total_age, age);
retain total_age;
end;
end;
set sashelp.class;
run;
It's either two steps in a data step (calculation + merge) or a DoW loop. Programmatically, SQL is much easier here for sure.
Depending on what you're doing, for example creating a report, the reporting procedure may be better suited to doing that as well.
data sas_intro;
do _n_ =1 by 1 until(eof);
set sashelp.class end=eof;
total_age = sum(total_age, age); /*creates cumulative sum, not total)*/
end;
do _n_ =1 by 1 until(eof2);
set sashelp.class end=eof2;
output;
end;
run;
Other methods illustrated here:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
An SQL approach
proc sql;
create table want as
select *
, sum(age) as total_age
from sashelp.class
;
quit;
A data step approach
data want;
if _N_ = 1 then do;
do until (z);
set sashelp.class end = z;
total_age = sum(total_age, age);
retain total_age;
end;
end;
set sashelp.class;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.