BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asinusdk
Calcite | Level 5
yearmonthdateidv1v2
200111110
200112201
2001112201
2001213211
2002214310
200231301
2002512311
2003213201
2003414201
200351211
2003512310
2003713301
2003814301
200391311
2003101310
20031113301
20031225301

 

From this data, I want to make a different table by summing frequencies from each variable by year and moth.

The expected result will be like this.

yearmonthv1_freqv2_freq
2001112
2001221
2001300
2001400
2001500
2001600
2001700
2001800
2001900
20011000
20011100
20011200
2002100
2002210
2002301
2002400
2002511
2002600
2002700
2002800
2002900
20021000
20021100
20021200
2003100
2003201
2003300
2003401
2003521
2003600
2003701
2003801
2003911
20031010
20031101
2003120

1

 

What codes can I use? 

Thank you so much in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class year month;
    var v1 v2;
    output out=want sum=v1freq v2freq;
run;

The rows for months that have no data will not appear in the output via this method. If you need them to appear, you would have to augment your original data set so that there is at least one record for each year/month combination.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class year month;
    var v1 v2;
    output out=want sum=v1freq v2freq;
run;

The rows for months that have no data will not appear in the output via this method. If you need them to appear, you would have to augment your original data set so that there is at least one record for each year/month combination.

--
Paige Miller
Reeza
Super User

Use PROC MEANS to summarize your data. You control the levels of aggregating using the CLASS statement. 

If you need more than one level, ie totals or such then you want to look into WAYS and TYPES but otherwise this will get you there:

 

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas

 


@asinusdk wrote:
year month date id v1 v2
2001 1 1 1 1 0
2001 1 2 2 0 1
2001 1 12 2 0 1
2001 2 13 2 1 1
2002 2 14 3 1 0
2002 3 1 3 0 1
2002 5 12 3 1 1
2003 2 13 2 0 1
2003 4 14 2 0 1
2003 5 1 2 1 1
2003 5 12 3 1 0
2003 7 13 3 0 1
2003 8 14 3 0 1
2003 9 1 3 1 1
2003 10 1 3 1 0
2003 11 13 3 0 1
2003 12 25 3 0 1

 

From this data, I want to make a different table by summing frequencies from each variable by year and moth.

The expected result will be like this.

year month v1_freq v2_freq
2001 1 1 2
2001 2 2 1
2001 3 0 0
2001 4 0 0
2001 5 0 0
2001 6 0 0
2001 7 0 0
2001 8 0 0
2001 9 0 0
2001 10 0 0
2001 11 0 0
2001 12 0 0
2002 1 0 0
2002 2 1 0
2002 3 0 1
2002 4 0 0
2002 5 1 1
2002 6 0 0
2002 7 0 0
2002 8 0 0
2002 9 0 0
2002 10 0 0
2002 11 0 0
2002 12 0 0
2003 1 0 0
2003 2 0 1
2003 3 0 0
2003 4 0 1
2003 5 2 1
2003 6 0 0
2003 7 0 1
2003 8 0 1
2003 9 1 1
2003 10 1 0
2003 11 0 1
2003 12 0

1

 

What codes can I use? 

Thank you so much in advance!


 

hashman
Ammonite | Level 13

@asinusdk:

Assuming that your input data are sorted by YEAR, MONTH, you can use a single DATA step to: 

  1. aggregate within each (YEAR,MONTH) by-group
  2. after the last by-group record is processed:
  3. look ahead into the first record of the next by-group 
  4. use the date from that next record to generate the necessary records to fill the temporal gap

In the SAS language: 

data have ;                                                                
  input year month v1 v2 ;                                                 
  cards ;                                                                  
2001  1  1  0                                                              
2001  1  0  1                                                              
2001  1  0  1                                                              
2001  2  1  1                                                              
2002  2  1  0                                                              
2002  3  0  1                                                              
2002  5  1  1                                                              
2003  2  0  1                                                              
2003  4  0  1                                                              
2003  5  1  1                                                              
2003  5  1  0                                                              
2003  7  0  1                                                              
2003  8  0  1                                                              
2003  9  1  1                                                              
2003 10  1  0                                                              
2003 11  0  1                                                              
2003 12  0  1                                                              
run ;                                                                      
                                                                           
data want (drop = _:) ;                                                    
  set have (keep = year month v:) ;                                        
  by year month ;                                                          
  v1_freq + v1 ;                                                           
  v2_freq + v2 ;                                                           
  if last.month ;                                                          
  output ;                                                                 
  v1_freq = 0 ;                                                            
  v2_freq = 0 ;                                                            
  if _n_ < n ;                                                             
  p = _n_ + 1 ;                                                            
  set have (keep = year month rename = (year=_y month=_m)) point=p nobs=n ;
  _dt = mdy (month, 1, year) ;                                             
  do _q = 2 to intck ("mon", _dt, mdy (_m, 1, _y)) ;                       
    _dt = intnx ("mon", _dt, 1) ;                                          
    year = year (_dt) ;                                                    
    month = month (_dt) ;                                                  
    output ;                                                               
  end ;                                                                    
run ;                                                                      

If the input data set isn't sorted, basically the same gap-filling scheme can be enacted using the hash object by:

  1. making it aggregate the sums
  2. augmenting the hash table with the dates not found in the table and coupled with zero sums

In SAS words:

data _null_ ;                                                           
  dcl hash h (ordered:"a") ;                                            
  h.definekey ("year", "month") ;                                       
  h.definedata ("year", "month", "v1_freq", "v2_freq") ;                
  h.definedone () ;                                                     
  do until (z) ;                                                        
    set have (keep = year month v:) end = z ;                           
    if h.find() ne 0 then call missing (v1_freq, v2_freq) ;             
    v1_freq + v1 ;                                                      
    v2_freq + v2 ;                                                      
    h.replace() ;                                                       
    _ldt = min (_ldt, mdy (month, 1, year)) ;                           
    _hdt = max (_hdt, mdy (month, 1, year)) ;                           
  end ;                                                                 
  do _n_ = 1 to intck ("mon", _ldt, _hdt) ;                             
    _ldt = intnx ("mon", _ldt, 1) ;                                     
    year = year (_ldt) ;                                                
    month = month (_ldt) ;                                              
    h.ref (key:year, key:month, data:year, data:month, data:0, data:0) ;
  end ;                                                                 
  h.output (dataset:"want") ;                                          
run ;                                                                   

 Of course, the problem can be solved using either MEANS/SUMMARY, as suggested earlier, or SQL, with the aggregated result set left-joined with a separately prepared data set containing the fill-gap (YEAR,MONTH) and zero sums. Preparing such a data set is principally no different from running a DO loop similar to the DO _N_= loop above. The advantage of the hash object is that the entire processing can be done in a single step and a single pass through the input data, regardless of the input key sequence.

 

Paul D.

singhsahab
Lapis Lazuli | Level 10
Data have ;                                                                
  input year month v1 v2 ;                                                 
  cards ;                                                                  
2001  1  1  0                                                              
2001  1  0  1                                                              
2001  1  0  1                                                              
2001  2  1  1                                                              
2002  2  1  0                                                              
2002  3  0  1                                                              
2002  5  1  1                                                              
2003  2  0  1                                                              
2003  4  0  1                                                              
2003  5  1  1                                                              
2003  5  1  0                                                              
2003  7  0  1                                                              
2003  8  0  1                                                              
2003  9  1  1                                                              
2003 10  1  0                                                              
2003 11  0  1                                                              
2003 12  0  1  
;                                                            
run ;                                                                      
      
PROC SQL;
CREATE TABLE WANT AS
SELECT YEAR,MONTH,SUM(V1) AS FR_V1,SUM(V2) AS FR_V2 FROM HAVE GROUP BY YEAR ,MONTH;
QUIT;

If you wanted to use PROC SQL above code can help. if wanted to see all month data at least one entry is needed for each month per year. 

 

Thanks...

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1573 views
  • 0 likes
  • 5 in conversation