BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kavange
Calcite | Level 5

 

Hello,

 

Using this example, could someone help me figure out how to calculate and add back the sum of 'count' by child and year. In other words, how do we add a column to the table where the sum of 'count' is calculated and repeated for every row with the same combination of 'child' and 'year' values. I'm a new SAS user so please keep it simple!

 

 data child;
   infile datalines dsd truncover;
   input child:32. year:32. Age:32. count:32. wage:32.;
datalines;

0 2017 25 100 12.5

0 2017 30 20 20.4

0 2017 38 60 15

0 2018 22 40 12.5

0 2018 28 80 10.5

0 2017 37 10 15

1 2017 40 40 12.5

1 2017 28 80 10.5

1 2017 35 10 15

1 2018 44 40 12.5

1 2018 22 50 10.5

1 2018 37 10 15

;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

 

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

 

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

 

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

 

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26
proc summary data=child nway;
    class child;
    var count;
    output out=_stats_ sum=sum_count;
run;
data want;
    merge child _stats_;
    by child;
run;
--
Paige Miller
ballardw
Super User

One way

data child;
   input child year Age count wage;
datalines; 
0 2017 25 100 12.5 
0 2017 30 20 20.4 
0 2017 38 60 15 
0 2018 22 40 12.5 
0 2018 28 80 10.5
0 2017 37 10 15
1 2017 40 40 12.5 
1 2017 28 80 10.5
1 2017 35 10 15 
1 2018 44 40 12.5 
1 2018 22 50 10.5
1 2018 37 10 15 
;
proc summary data=child nway;
   class child year;
   var count;
   output out=work.sum (drop=_:) sum=Total;
run;

proc sql;
   create table want as
   select child.child, child.year, child.age, child.count, child.wage
          ,t.total
   from child 
        left join
        work.sum as t
        on child.child=t.child
        and child.year=t.year
   ;
quit;

Please post code in a code box. The forum will reformat pasted text. That was the possible reason why your "as posted" code would not run correctly for the data step. So mine looks a little different than yours.

 

I use the Proc summary to create a set with summed counts and Proc SQL to join things as you have multiple records with the same child and year and SQL does a better job aligning things.

 

The summary could be done in a subquery within the proc sql but I think this approach is "simpler".

 

The alias bit in the proc sql, work.sum as t allows using the letter t as short hand for referencing the data set work.sum. You can't use a two-level data set name with the .<variable name> due to syntax rules. So the alias is needed for work.sum. A single level name for a data set can be used as with the CHILD data set.

 

 

Reeza
Super User
There are several methods, PROC MEANS + MERGE and SQL being the most common. Both are illustrated by other users but I also have two examples here:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
hashman
Ammonite | Level 13

@kavange:

1. Proc SQL will compute the aggregates for you and remerge them back with the original data in the same step:

data have ;                                                                             
   input child year Age count wage ;                                                    
   cards ;                                                                              
0 2017 25 100 12.5                                                                      
0 2017 30  20 20.4                                                                      
0 2017 38  60 15                                                                        
0 2018 22  40 12.5                                                                      
0 2018 28  80 10.5                                                                      
0 2017 37  10 15                                                                        
1 2017 40  40 12.5                                                                      
1 2017 28  80 10.5                                                                      
1 2017 35  10 15                                                                        
1 2018 44  40 12.5                                                                      
1 2018 22  50 10.5                                                                      
1 2018 37  10 15                                                                        
run ;                                                                                   
                                                                                        
proc sql ;                                                                              
  create table want as select *, sum (count) as sum_count from have group  child, year ;
quit ;                                                                                  

2. If your input data set were sorted by child and year (which it's not because of record #6 - maybe you didn't intend that), it would be a typical DATA step job for the double DoW-loop:

data have ;                                
   input child year Age count wage ;       
   cards ;                                 
0 2017 25 100 12.5                         
0 2017 30  20 20.4                         
0 2017 38  60 15                           
0 2017 37  10 15                           
0 2018 22  40 12.5                         
0 2018 28  80 10.5                         
1 2017 40  40 12.5                         
1 2017 28  80 10.5                         
1 2017 35  10 15                           
1 2018 44  40 12.5                         
1 2018 22  50 10.5                         
1 2018 37  10 15                           
run ;                                      
                                           
data want ;                                
  do _n_ = 1 by 1 until (last.year) ;      
    set have ;                             
    by child year ;                        
    sum_count = sum (sum_count, count, 0) ;
  end ;                                    
  do _n_ = 1 to _n_ ;                      
    set have ;                             
    output ;                               
  end ;                                    
run ;                                      

3. If your input file is completely unsorted, and especially if you want to do this job while preserving the original record order, then the hash object is just the ticket:

data have ;                                    
   input child year Age count wage ;           
   cards ;                                     
1 2018 22  50 10.5                             
0 2017 30  20 20.4                             
0 2017 38  60 15                               
1 2017 40  40 12.5                             
0 2018 28  80 10.5                             
1 2017 28  80 10.5                             
0 2017 25 100 12.5                             
1 2017 35  10 15                               
0 2018 22  40 12.5                             
1 2018 44  40 12.5                             
0 2017 37  10 15                               
1 2018 37  10 15                               
run ;                                          
                                               
data want ;                                    
  if _n_ = 1 then do ;                         
    dcl hash h () ;                            
    h.definekey ("child", "year") ;            
    h.definedata ("sum_count") ;               
    h.definedone () ;                          
    do until (z) ;                             
      set have end = z ;                       
      if h.find() ne 0 then sum_count = count ;
      else                  sum_count + count ;
      h.replace() ;                            
    end ;                                      
  end ;                                        
  set have ;                                   
  h.find() ;                                   
run ;                                          

Kind regards

Paul D.

 

kavange
Calcite | Level 5

Thank you so much!! This solved my issue.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1343 views
  • 1 like
  • 5 in conversation