Hello,
Using this example, could someone help me figure out how to calculate and add back the sum of 'count' by child and year. In other words, how do we add a column to the table where the sum of 'count' is calculated and repeated for every row with the same combination of 'child' and 'year' values. I'm a new SAS user so please keep it simple!
data child;
infile datalines dsd truncover;
input child:32. year:32. Age:32. count:32. wage:32.;
datalines;
0 2017 25 100 12.5
0 2017 30 20 20.4
0 2017 38 60 15
0 2018 22 40 12.5
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5
1 2017 28 80 10.5
1 2017 35 10 15
1 2018 44 40 12.5
1 2018 22 50 10.5
1 2018 37 10 15
;
One way
data child; input child year Age count wage; datalines; 0 2017 25 100 12.5 0 2017 30 20 20.4 0 2017 38 60 15 0 2018 22 40 12.5 0 2018 28 80 10.5 0 2017 37 10 15 1 2017 40 40 12.5 1 2017 28 80 10.5 1 2017 35 10 15 1 2018 44 40 12.5 1 2018 22 50 10.5 1 2018 37 10 15 ; proc summary data=child nway; class child year; var count; output out=work.sum (drop=_:) sum=Total; run; proc sql; create table want as select child.child, child.year, child.age, child.count, child.wage ,t.total from child left join work.sum as t on child.child=t.child and child.year=t.year ; quit;
Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.
I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.
The summary could be done in a subquery within the proc sql but I think this approach is "simpler".
The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.
proc summary data=child nway;
class child;
var count;
output out=_stats_ sum=sum_count;
run;
data want;
merge child _stats_;
by child;
run;
One way
data child; input child year Age count wage; datalines; 0 2017 25 100 12.5 0 2017 30 20 20.4 0 2017 38 60 15 0 2018 22 40 12.5 0 2018 28 80 10.5 0 2017 37 10 15 1 2017 40 40 12.5 1 2017 28 80 10.5 1 2017 35 10 15 1 2018 44 40 12.5 1 2018 22 50 10.5 1 2018 37 10 15 ; proc summary data=child nway; class child year; var count; output out=work.sum (drop=_:) sum=Total; run; proc sql; create table want as select child.child, child.year, child.age, child.count, child.wage ,t.total from child left join work.sum as t on child.child=t.child and child.year=t.year ; quit;
Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.
I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.
The summary could be done in a subquery within the proc sql but I think this approach is "simpler".
The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.
1. Proc SQL will compute the aggregates for you and remerge them back with the original data in the same step:
data have ;
input child year Age count wage ;
cards ;
0 2017 25 100 12.5
0 2017 30 20 20.4
0 2017 38 60 15
0 2018 22 40 12.5
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5
1 2017 28 80 10.5
1 2017 35 10 15
1 2018 44 40 12.5
1 2018 22 50 10.5
1 2018 37 10 15
run ;
proc sql ;
create table want as select *, sum (count) as sum_count from have group child, year ;
quit ;
2. If your input data set were sorted by child and year (which it's not because of record #6 - maybe you didn't intend that), it would be a typical DATA step job for the double DoW-loop:
data have ;
input child year Age count wage ;
cards ;
0 2017 25 100 12.5
0 2017 30 20 20.4
0 2017 38 60 15
0 2017 37 10 15
0 2018 22 40 12.5
0 2018 28 80 10.5
1 2017 40 40 12.5
1 2017 28 80 10.5
1 2017 35 10 15
1 2018 44 40 12.5
1 2018 22 50 10.5
1 2018 37 10 15
run ;
data want ;
do _n_ = 1 by 1 until (last.year) ;
set have ;
by child year ;
sum_count = sum (sum_count, count, 0) ;
end ;
do _n_ = 1 to _n_ ;
set have ;
output ;
end ;
run ;
3. If your input file is completely unsorted, and especially if you want to do this job while preserving the original record order, then the hash object is just the ticket:
data have ;
input child year Age count wage ;
cards ;
1 2018 22 50 10.5
0 2017 30 20 20.4
0 2017 38 60 15
1 2017 40 40 12.5
0 2018 28 80 10.5
1 2017 28 80 10.5
0 2017 25 100 12.5
1 2017 35 10 15
0 2018 22 40 12.5
1 2018 44 40 12.5
0 2017 37 10 15
1 2018 37 10 15
run ;
data want ;
if _n_ = 1 then do ;
dcl hash h () ;
h.definekey ("child", "year") ;
h.definedata ("sum_count") ;
h.definedone () ;
do until (z) ;
set have end = z ;
if h.find() ne 0 then sum_count = count ;
else sum_count + count ;
h.replace() ;
end ;
end ;
set have ;
h.find() ;
run ;
Kind regards
Paul D.
Thank you so much!! This solved my issue.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.