BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kavange
Calcite | Level 5

 

Hello,

 

Using this example, could someone help me figure out how to calculate and add back the sum of 'count' by child and year. In other words, how do we add a column to the table where the sum of 'count' is calculated and repeated for every row with the same combination of 'child' and 'year' values. I'm a new SAS user so please keep it simple!

 

 data child;
   infile datalines dsd truncover;
   input child:32. year:32. Age:32. count:32. wage:32.;
datalines;

0 2017 25 100 12.5

0 2017 30 20 20.4

0 2017 38 60 15

0 2018 22 40 12.5

0 2018 28 80 10.5

0 2017 37 10 15

1 2017 40 40 12.5

1 2017 28 80 10.5

1 2017 35 10 15

1 2018 44 40 12.5

1 2018 22 50 10.5

1 2018 37 10 15

;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

 

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

 

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

 

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

 

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26
proc summary data=child nway;
    class child;
    var count;
    output out=_stats_ sum=sum_count;
run;
data want;
    merge child _stats_;
    by child;
run;
--
Paige Miller
ballardw
Super User

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

 

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

 

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

 

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

 

 

Reeza
Super User
There are several methods, PROC MEANS + MERGE and SQL being the most common. Both are illustrated by other users but I also have two examples here:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
hashman
Ammonite | Level 13

@kavange:

1. Proc SQL will compute the aggregates for you and remerge them back with the original data in the same step:

data have ;                                                                             
   input child year Age count wage ;                                                    
   cards ;                                                                              
0 2017 25 100 12.5                                                                      
0 2017 30  20 20.4                                                                      
0 2017 38  60 15                                                                        
0 2018 22  40 12.5                                                                      
0 2018 28  80 10.5                                                                      
0 2017 37  10 15                                                                        
1 2017 40  40 12.5                                                                      
1 2017 28  80 10.5                                                                      
1 2017 35  10 15                                                                        
1 2018 44  40 12.5                                                                      
1 2018 22  50 10.5                                                                      
1 2018 37  10 15                                                                        
run ;                                                                                   
                                                                                        
proc sql ;                                                                              
  create table want as select *, sum (count) as sum_count from have group  child, year ;
quit ;                                                                                  

2. If your input data set were sorted by child and year (which it's not because of record #6 - maybe you didn't intend that), it would be a typical DATA step job for the double DoW-loop:

data have ;                                
   input child year Age count wage ;       
   cards ;                                 
0 2017 25 100 12.5                         
0 2017 30  20 20.4                         
0 2017 38  60 15                           
0 2017 37  10 15                           
0 2018 22  40 12.5                         
0 2018 28  80 10.5                         
1 2017 40  40 12.5                         
1 2017 28  80 10.5                         
1 2017 35  10 15                           
1 2018 44  40 12.5                         
1 2018 22  50 10.5                         
1 2018 37  10 15                           
run ;                                      
                                           
data want ;                                
  do _n_ = 1 by 1 until (last.year) ;      
    set have ;                             
    by child year ;                        
    sum_count = sum (sum_count, count, 0) ;
  end ;                                    
  do _n_ = 1 to _n_ ;                      
    set have ;                             
    output ;                               
  end ;                                    
run ;                                      

3. If your input file is completely unsorted, and especially if you want to do this job while preserving the original record order, then the hash object is just the ticket:

data have ;                                    
   input child year Age count wage ;           
   cards ;                                     
1 2018 22  50 10.5                             
0 2017 30  20 20.4                             
0 2017 38  60 15                               
1 2017 40  40 12.5                             
0 2018 28  80 10.5                             
1 2017 28  80 10.5                             
0 2017 25 100 12.5                             
1 2017 35  10 15                               
0 2018 22  40 12.5                             
1 2018 44  40 12.5                             
0 2017 37  10 15                               
1 2018 37  10 15                               
run ;                                          
                                               
data want ;                                    
  if _n_ = 1 then do ;                         
    dcl hash h () ;                            
    h.definekey ("child", "year") ;            
    h.definedata ("sum_count") ;               
    h.definedone () ;                          
    do until (z) ;                             
      set have end = z ;                       
      if h.find() ne 0 then sum_count = count ;
      else                  sum_count + count ;
      h.replace() ;                            
    end ;                                      
  end ;                                        
  set have ;                                   
  h.find() ;                                   
run ;                                          

Kind regards

Paul D.

 

kavange
Calcite | Level 5

Thank you so much!! This solved my issue.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1438 views
  • 1 like
  • 5 in conversation