Solved: How do I calculated and add back totals by group

kavange · Posted 06-21-2019 11:04 AM

Hello,

Using this example, could someone help me figure out how to calculate and add back the sum of 'count' by child and year. In other words, how do we add a column to the table where the sum of 'count' is calculated and repeated for every row with the same combination of 'child' and 'year' values. I'm a new SAS user so please keep it simple!

data child;
infile datalines dsd truncover;
input child:32. year:32. Age:32. count:32. wage:32.;
datalines;

0 2017 25 100 12.5

0 2017 30 20 20.4

0 2017 38 60 15

0 2018 22 40 12.5

0 2018 28 80 10.5

0 2017 37 10 15

1 2017 40 40 12.5

1 2017 28 80 10.5

1 2017 35 10 15

1 2018 44 40 12.5

1 2018 22 50 10.5

1 2018 37 10 15

;

ballardw · Posted 06-21-2019 11:17 AM

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

View solution in original post

PaigeMiller · Posted 06-21-2019 11:10 AM

proc summary data=child nway;
    class child;
    var count;
    output out=_stats_ sum=sum_count;
run;
data want;
    merge child _stats_;
    by child;
run;

--
Paige Miller

ballardw · Posted 06-21-2019 11:17 AM

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

Reeza · Posted 06-21-2019 11:32 AM

There are several methods, PROC MEANS + MERGE and SQL being the most common. Both are illustrated by other users but I also have two examples here:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

hashman · Posted 06-21-2019 03:05 PM

@kavange:

1. Proc SQL will compute the aggregates for you and remerge them back with the original data in the same step:

data have ;                                                                             
   input child year Age count wage ;                                                    
   cards ;                                                                              
0 2017 25 100 12.5                                                                      
0 2017 30  20 20.4                                                                      
0 2017 38  60 15                                                                        
0 2018 22  40 12.5                                                                      
0 2018 28  80 10.5                                                                      
0 2017 37  10 15                                                                        
1 2017 40  40 12.5                                                                      
1 2017 28  80 10.5                                                                      
1 2017 35  10 15                                                                        
1 2018 44  40 12.5                                                                      
1 2018 22  50 10.5                                                                      
1 2018 37  10 15                                                                        
run ;                                                                                   
                                                                                        
proc sql ;                                                                              
  create table want as select *, sum (count) as sum_count from have group  child, year ;
quit ;

2. If your input data set were sorted by child and year (which it's not because of record #6 - maybe you didn't intend that), it would be a typical DATA step job for the double DoW-loop:

data have ;                                
   input child year Age count wage ;       
   cards ;                                 
0 2017 25 100 12.5                         
0 2017 30  20 20.4                         
0 2017 38  60 15                           
0 2017 37  10 15                           
0 2018 22  40 12.5                         
0 2018 28  80 10.5                         
1 2017 40  40 12.5                         
1 2017 28  80 10.5                         
1 2017 35  10 15                           
1 2018 44  40 12.5                         
1 2018 22  50 10.5                         
1 2018 37  10 15                           
run ;                                      
                                           
data want ;                                
  do _n_ = 1 by 1 until (last.year) ;      
    set have ;                             
    by child year ;                        
    sum_count = sum (sum_count, count, 0) ;
  end ;                                    
  do _n_ = 1 to _n_ ;                      
    set have ;                             
    output ;                               
  end ;                                    
run ;

3. If your input file is completely unsorted, and especially if you want to do this job while preserving the original record order, then the hash object is just the ticket:

data have ;                                    
   input child year Age count wage ;           
   cards ;                                     
1 2018 22  50 10.5                             
0 2017 30  20 20.4                             
0 2017 38  60 15                               
1 2017 40  40 12.5                             
0 2018 28  80 10.5                             
1 2017 28  80 10.5                             
0 2017 25 100 12.5                             
1 2017 35  10 15                               
0 2018 22  40 12.5                             
1 2018 44  40 12.5                             
0 2017 37  10 15                               
1 2018 37  10 15                               
run ;                                          
                                               
data want ;                                    
  if _n_ = 1 then do ;                         
    dcl hash h () ;                            
    h.definekey ("child", "year") ;            
    h.definedata ("sum_count") ;               
    h.definedone () ;                          
    do until (z) ;                             
      set have end = z ;                       
      if h.find() ne 0 then sum_count = count ;
      else                  sum_count + count ;
      h.replace() ;                            
    end ;                                      
  end ;                                        
  set have ;                                   
  h.find() ;                                   
run ;

Kind regards

Paul D.

kavange · Posted 06-21-2019 03:10 PM

Thank you so much!! This solved my issue.

How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Registration is open

How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Re: How do I calculated and add back totals by group

Registration is open

SAS Training: Just a Click Away